CS413 Q&a

CS413 DEC 2022
QUESTION 1
a) Differentiate the following with the aid of an example.
i. Single Value and Multivalued attribute. [4] ii.
Derived and Non-Derived Attribute. [4] iii.
Candidate Key and Super Key. [4]
b) i. How does a query tree represent a relational algebra expression? [4] ii.
Discuss any two rules for query optimisation, giving example as to when should
each rule be applied.?
SOLUTIONS:
a)
i. Single Value and Multivalued attribute:
A single value attribute is an attribute that can have only one value for each entity instance. For
example, the "age" attribute of a person entity can have only one value for each person.
A multivalued attribute is an attribute that can have multiple values for each entity instance. For
example, the "phone number" attribute of a person entity can have multiple values (home phone, work
phone, mobile phone) for each person. ii. Derived and Non-Derived Attribute:
A derived attribute is an attribute that can be calculated or derived from other attributes in the same
entity or related entities. For example, the "total price" attribute of a sales order entity can be derived
by multiplying the "unit price" and "quantity" attributes.
A non-derived attribute is an attribute that cannot be calculated or derived from other attributes in the
same entity or related entities. For example, the "name" attribute of a person entity is a non-derived
attribute. iii. Candidate Key and Super Key:
A candidate key is a minimal set of attributes that uniquely identifies each entity instance in a
relation. For example, in a student relation, the combination of "student ID" and "email address" may
form a candidate key.
A super key is a set of attributes that uniquely identifies each entity instance in a relation but may
contain more attributes than necessary to form a candidate key. For example, in the same student
relation, the combination of "student ID", "email address", and "phone number" may form a super key
but not necessarily a candidate key. b)
i. A query tree represents a relational algebra expression by breaking down the expression into smaller
sub-expressions until they are simple enough to be evaluated using basic relational algebra operations
such as selection, projection, union, intersection, and difference. Each node in the query tree
represents an operation on one or more relations, and the edges represent the flow of data between
the operations. ii. Two rules for query optimization are:
1. Pushing down selection: This rule involves pushing down selection operations as far downthe
query tree as possible to reduce the amount of data that needs to be processed by subsequent
operations. For example, if we have a query that selects all customers who live in New York and have
placed an order in the last month, we can push down the selection on "New York" to a relation that
contains only customers from New York before joining it with a relation that contains only orders from
the last month.
2. Join ordering: This rule involves choosing the most efficient order in which to perform
joinoperations based on the size and selectivity of each relation involved. For example, if we have a
query that joins three relations A, B, and C, we can choose to join A and B first if they are smaller and
more selective than C, or we can choose to join B and C first if they have a
common attribute that can be used for equi-join. The choice of join ordering can significantly affect the
performance of a query.
QUESTION 2:
Question 2
The tables below are currently held on a database at the single site called CTR. Following reorganization
it is intended to distribute the journals held at the central database across 3 new branch libraries
located at remote sites called GTR; UTC and TWE. The central library becomes the HQ (an
administrative centre) meaning that it no longer keeps or loans out any journals itself. Instead journals
are made available for loan to borrowers registered at any of the 3 new sites.
JOURNAL
JournalID JournalName
3215 Database Weekly
3216 Database Journal
3217 Oracle News
3218 ACM TODS
ARTICLE
ArticlelD ArticleTitle AuthorlD JournalID
3215 ObjectOriented Analysis 23 3216
2409 Oracle indexing 18 3217
1398 DBA performance tools 23 3216
1289 Pioneers of Databases 23 3215
2554 Query optimisation 67 3216
1678 Daplex 18 3218
4561 Niam Fact model 18 3218
AUTHOR
AuthorlD AuthorFName AuthorSname
23 Norman Gray
18 Carlos Santos
67 Modrich Neuman
LOAN
ArticlelD BorrowerlD LoanDate ReturnDate
2409 43 3/1/15 4/2/15
1398 43 3/1/15 24/1/15
1289 17 6/2/15 8/2/15
2554 26 1/2/15 12/2/15
2409 43 14/2/15
2554 52 14/2/15
BORROWER
BorrowerlD BorrowerFname BorrowerLname BorrowerTelNo
A52 Jane Green 0156387562
43 Fred Briggs 01985-86722
17 Henry Dhura 01582-74238
26 Jonas Smith 01933-632001
a) Describe three different proposals for data distribution of the central database (CTR).
[9]
Hint: Show the distribution/ replication of table fragments/partitions and explain any trade-off and
pros/cons you think are relevant.
SOLUTION:
Proposal 1: Distribute the entire database to each of the three new branch libraries (GTR, UTC, and
TWE). This would involve creating three identical copies of the entire database and distributing them to
each of the new sites. The advantage of this approach is that each site would have complete control
over their own copy of the database, which could improve performance and reduce network traffic.
However, this approach would require a significant amount of storage space at each site and could be
expensive to maintain.
Proposal 2: Partition the Journal table by JournalID and distribute each partition to a different site. For
example, Journals 3215 and 3216 could be sent to GTR, Journal 3217 could be sent to UTC, and Journal
3218 could be sent to TWE. This approach would reduce network traffic by only sending relevant data
to each site. However, if a borrower wanted to borrow journals from multiple sites, they would need to
access multiple databases.
Proposal 3: Replicate the Article table at each site but keep the Journal table centralized at CTR. This
approach would allow borrowers at any site to access all journals but only need to access their local
copy of articles. However, this approach may result in inconsistencies between copies of the Article
table if updates are not properly synchronized across all sites.
Overall, there are trade-offs between data distribution strategies that must be considered based on
factors such as performance requirements, storage capacity, network bandwidth limitations, and cost.
QUESTION 2
b) Consider the following relations containing airline flight information:
Flights (flno: integer, from: string, to: string, distance: integer, departs: time,
arrives: time)
Aircraft (aid: integer, aname: string, cruisingrange: integer)
Certified (eid: integer, aid: integer)
Employees (eid: integer, ename: string, salary: integer)
Note that the Employees relation describes pilots and other kinds of
employees as well; every pilot is certified for some aircraft (otherwise, he or she would not qualify as a
pilot), and only pilots are certified to fly. Write the following queries in relational algebra:
i. Find the eid of pilots certified for some Boeing aircraft. [2]
ii. Find the names of pilots certified for some Boeing aircraft. [2]
iii. Identify the flights that can be piloted by every pilot whose salary is
more than $100,000. [2]
c) Write the above queries in (b.) in tuple relational calculus. [6]
SOLUTION:
b) π to project; σ to select; ⨝ is a natural join
sql format = column condition table
i. Π eid (σ aname=”Boeing”(Aircraft ⨝ Certified))
ii. Π ename ( σ aname='Boeing'(Aircraft⨝Certified
⨝Employees))
iii. Π flno (σ salary>100000(Employees) ⨝ Certified
⨝ (πaid(Certified) ÷ πaid(Employees)))
c)
i. {e.eid | ∃c(a.Aircraft(aid,cruisingrange,'Boeing') ∧ Certified(eid,aid))}
ii.{e.ename | ∃c(a.Aircraft(aid,cruisingrange,'Boeing') ∧ Certified(eid,aid) ∧

Employees(eid,ename,salary))}
iii. {f | ∀e(Employees(e.eid,e.ename,e.salary) ∧ e.salary>100000 ∧ ∃c(Certified(e.eid,c.aid) ∧

∀f(Flights(f.flno,f.from,f.to,f.distance,f.departs,f.arrives) → ∃a(Aircraft(a.aid,a.cruisingrange) ∧
c.aid=a.aid ∧ f.distance<=a.cruisingrange)))}
QUESTION 3:
Question 3
a) A library stores information about books and authors, among other things that we
disregard for this problem. We assume for simplicity that books can
be uniquely identified by their title, and authors by their name. For books,
the database also stores the year it was first published. A book version is a
version of a book in a specific language, on a specific media (paper version,
audio book, digital version, etc). For each version, the database stores its
unique ISBN number, as well as the media and language of the version. Book
versions are also categorized, for easy indexing. There are a lot of
categories a book version can be tagged with (e.g.fiction, geography, child
novel, short version, etc), and a book can have any number of
categorizations. Note specifically that it's the book versions that are
categorized, so different versions of the same book need not be tagged with the same
set of categories. The following relation sums up all the attributes that should be
stored in the database:
Library(authorName, bookTitle, yearPub, ISBN,media, language, category)
i. Find all dependencies and independencies (multi-valued dependencies)that

are expected to hold for this domain given the domain description above. [8]
ii. Do a complete decomposition of Library so that the resulting schemafulfills
4NF. [5]
b) Let transactions T1, T2 and T3 be defined to perform the following operations :

T1 : Add one to A T2 : Double A
T3 : Display A on the screen and then set A to one. (where A is some item in the database) Suppose
transactions T1, T2 and T3 are allowed to execute concurrently. If A has initial value zero, how many
possible correct results are there? Enumerate them. [12] SOLUTION:
a)
Dependencies:
- authorName -> bookTitle, yearPub- bookTitle -> authorName, yearPub -
ISBN -> media, language, category
Independencies:
- There are no known functional dependencies between any of the
attributes.
Multi-valued dependencies:
- bookTitle ->> category (a book can have multiple categories) - ISBN ->>
category (a version of a book can have multiple categories)
Decomposition:
1. Book(authorName, bookTitle, yearPub)
2. BookVersion(ISBN, media, language) 3. BookCategory(bookTitle,
category)
b)
Possible correct results:
1. A = 1
2. A = 2
3. A = 4
4. A = 8
5. A = 16
6. A = 32
7. A = 64
8. A = 128
9. A = 256
10. A = 512
11. A = 1024
12. A = 2048
Explanation:
If T1 executes first, then T2 and T3 can execute in any order and the result will always be the same
(A=2). If T2 executes first, then T1 and T3 can execute in any order and the result will always be the
same (A=1). If T3 executes first, then either T1 or T2 can execute next and the result will always be a
power of two (starting with A=1). Therefore, there are twelve possible correct results for this scenario.
Question 4
Suppose you and your friends are starting an e-commerce company which sells various kinds of
products in daily life like perfume and toys online. Now you are trying to design the company's website.
Based on the following requirements, design an ER diagram for the database of the website. For each
binary relationship
you identified, state the cardinalities (1:1, 1:m or m:n) on the entities participating in
this relationship. [18]
•The database maintains the information of customers, including the
customer's name, email address, shipping address, billing address, credit
card number, and phone number. In order to arrange the shipment efficiently and
reduce the cost, the shipping address is composed by street, state and zip code.
•There are two kinds of customers, registered customer and non- registered customer. Registered
customers are identified by their Page 4 of 6
registered ids, and for each non-registered customer, a temporary id is used. • A product
has a product id, a name, its price, a supplier (from where this product is purchased) and a
description. Each product is identified by the product id.
•Each product has a number of items. All the items from a same product
are identical in looking, however, they are different in their item ids (imagine when you go to the
supermarket, although you buy two same things, they have different barcodes). In addition, each item
has a producing date. The item id alone is not enough to distinguish different items from all kinds of
products; instead, it must be associated with its corresponding product id.
•Each customer can order many items at a time. When he/she is making
an order, the date, time, and total amount of that order will be recorded. The total amount is not stored
information but calculated each time when a customer makes an order, by adding all the prices of items
together.
•Each product belongs to one or more categories. For example, a
photographer's book can belong to both "book" and "photography". Each category includes many kinds
of products. A category has its category number, its category name, and is identified by the category
number.
•For each registered customer, you will keep track of his/her favorite
categories. This will be useful when you suggest products for him/her in his/her future purchase. One
customer can favorite in one or more categories, and for each of his/her favorite, you will keep record
of the number of purchases he/she made in this category.
SOLUTION:
Step 1
Customer(
customerId int primary key,
firstName varchar(255),
lastName varchar(255),
email varchar(255),
password varchar(255),
dateOfBirth date,
shippingAddress varchar(255),
billingAddress varchar(255),
type varchar(255)
)
Seller(
sellerId int primary key,
firstName varchar(255),
lastName varchar(255),
email varchar(255),
password varchar(255),
dateOfBirth date,
shippingAddress varchar(255),
billingAddress varchar(255)
)
Product(
productId int primary key,
name varchar(255),
price decimal,
description text,
imageUrl varchar(255),
sellerId int,
foreign key (sellerId) references Seller(sellerId)
)
Category(
categoryId int primary key,
name varchar(255),
description text
)
Subcategory(
subcategoryId int primary key,
categoryId int,
name varchar(255),
description text,
foreign key(categoryId) references Category(categoryId)
)
Question 5
a) Discuss the key characteristics of a data warehouse and how it differs in content,
structure and function from an on-line transaction processing (OLTP) database. You should
support your discussion with suitable diagrams and examples. [10]
b) Describe any three types of knowledge produced from data mining giving an example
of each. [6]
SOLUTION:
a)
Key Characteristics of a Data Warehouse:

1. Subject-Oriented: A data warehouse is designed to support business analysis and decision
making by providing a comprehensive view of the organization's data from different perspectives or
subjects, such as sales, marketing, finance, and operations.
2. Integrated: A data warehouse integrates data from various sources, such as operational
databases, legacy systems, and external sources. The integration process involves cleaning,
transforming, and consolidating the data to ensure consistency and accuracy.
3. Time-Variant: A data warehouse stores historical data over a long period of time to support
trend analysis and forecasting. It also captures snapshots of the data at different points in time to
provide a time-series view of the organization's performance.
4. Non-Volatile: A data warehouse is read-only and does not allow updates or deletions of the
stored data. This ensures that the historical data remains intact and can be used for analysis without
any changes.
Structure:
A typical data warehouse has three main components:
1. Data Sources: These are the various sources from which the raw data is extracted for processing
in the warehouse.
2. ETL (Extract-Transform-Load) Process: This process involves extracting the raw data from
different sources, transforming it into a consistent format, and loading it into the warehouse.
3. Data Warehouse Database: This is where all the processed and transformed data is stored in an
organized manner for easy retrieval.
Function:
The primary function of a data warehouse is to provide decision-makers with easy access to accurate
and relevant information for analysis and reporting purposes. It enables organizations to make
informed decisions based on historical trends and patterns in their business operations.
On-line Transaction Processing (OLTP) Database:
An OLTP database is designed for transactional processing that involves frequent updates, insertions,
deletions, or modifications of small amounts of operational data in real-time. It supports day-to-day
business operations such as order processing, inventory management, and customer service.
Differences between Data Warehouse and OLTP Database:
1. Purpose: A data warehouse is designed for analytical processing, while an OLTP database is
designed for transactional processing.
2. Data Volume: A data warehouse stores large volumes of historical data, while an OLTP database
stores current operational data.
3. Data Structure: A data warehouse has a denormalized structure that supports complex queries
and analysis, while an OLTP database has a normalized structure that supports efficient transaction
processing.
4. Performance: A data warehouse is optimized for read-intensive operations, while an OLTP

database is optimized for write-intensive operations. b)
Types of Knowledge Produced from Data Mining:
1. Descriptive Knowledge: This type of knowledge describes the patterns and relationships inthe
data without making any predictions or decisions. For example, a retailer may use data mining to
identify the most popular products sold in different regions or seasons.
2. Predictive Knowledge: This type of knowledge uses statistical models and algorithms tomake
predictions about future events or outcomes based on historical data. For example, a bank may use
data mining to predict which customers are likely to default on their loans based on their credit history
and other factors.
3. Prescriptive Knowledge: This type of knowledge provides recommendations or solutionsbased

on the analysis of historical data. For example, a healthcare provider may use data mining to
recommend personalized treatment plans for patients based on their medical history and genetic
profile.
DEC 2019
QUESTION 1:
a) Using your own simple examples and suitable diagrams, explai warehousing and
big data concepts.
i. Data Cleansing, ii. Indexing

& Optimization, i i i.
Materialized Views.
b) Examine deferred modification and immediate modification tec explaining how

does recovery takes place in case of a failure in
SOLUTION:
a)
i. Data Cleansing: Data cleansing is the process of identifying and correcting or removing inaccurate,
incomplete, or irrelevant data from a dataset. For example, if a company has a customer database with
duplicate entries, missing information, or incorrect data, data cleansing can help to clean up the
database and ensure that it is accurate and reliable.
Diagram:
Before Data Cleansing:
| Customer ID | Name | Address | Phone Number |

|-------------|------|---------|--------------|
| 001 | John Smith | 123 Main St. | 555-1234 |
| 002 | Jane Doe | 456 Oak Ave. | | |
003 | John Smith | 789 Maple Rd. | |
After Data Cleansing:
| Customer ID | Name | Address | Phone Number |

|-------------|------|---------|--------------|
| 001 | John Smith | 123 Main St. | 555-1234 |
| 002 | Jane Doe | 456 Oak Ave. | 555-5678 | | 003 |
Mary Johnson | 789 Maple Rd. | 555-9012 |
ii. Indexing & Optimization: Indexing is the process of creating an index for a database table
toimprove the speed and efficiency of queries. Optimization involves analyzing the database structure
and query patterns to identify ways to improve performance.
Diagram:
Before Indexing & Optimization:
SELECT * FROM orders WHERE customer_id = '123'; After
Indexing & Optimization:
CREATE INDEX idx_customer_id ON orders (customer_id); SELECT *

FROM orders WHERE customer_id = '123';
iii. Materialized Views: A materialized view is a database object that contains the results of aquery
as a table-like structure. Materialized views can be used to improve query performance by
precomputing expensive queries and storing the results in a cache.
Diagram:
Before Materialized Views:
SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id; After
Materialized Views:
CREATE MATERIALIZED VIEW mv_customer_orders AS

SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id; SELECT *
FROM mv_customer_orders;
b)
Deferred Modification: In deferred modification, changes to the database are not immediately written
to disk. Instead, they are stored in a buffer or cache and written to disk at a later time. This can improve
performance by reducing the number of disk writes.
Immediate Modification: In immediate modification, changes to the database are immediately written
to disk. This ensures that the database is always up-to-date but can result in slower performance due to
frequent disk writes.
Recovery: In case of a failure, recovery involves restoring the database to a consistent state. In deferred
modification, recovery involves replaying the changes stored in the buffer or cache since the last
checkpoint. In immediate modification, recovery involves rolling back any incomplete transactions and
restoring the database from a backup.
Question 4
a) Prove the Armstrong's union rule. [5] SOLUTION:

Armstrong's union rule states that if we have two sets of functional dependencies F1 and F2, then the
closure of their union F1 ∪ F2 is equal to the union of their closures F+(F1 ∪ F2) = F+ (F1) ∪ F+(F2).
To prove this, we need to show that both sides of the equation contain the same set of functional
dependencies.
First, let's prove that F+(F1 ∪ F2) ⊆ F+(F1) ∪ F+(F2).
Suppose X → Y is a functional dependency in F+(F1 ∪ F2). This means that X determines Y based on the
given set of functional dependencies. We can break down this set into two parts: those in which all
attributes are from either X or Y (i.e., X → Y is already in either F1 or F2), and those in which some
attributes are from both X and Y.
For the first part, let's say X → Z is in either F1 or F2. Then, by definition, Z must be determined by some
set of attributes W that includes only those from either X or Y. Since Z is determined by W and W
determines Y (by transitivity), we have X → Y as well.
For the second part, suppose there exists a functional dependency A → B in which some attributes are
from both X and Y. Then, we can write A = A'X and B = B'Y for some sets A' and B'. By definition of
closure, we have A' → B' ∈ (F1 ∪ F2)+. This means that A' determines B' based on the given set of
functional dependencies. Since A determines A', we have A → B' as well. And since B' determines B, we
have A → B.
Therefore, we have shown that any functional dependency in F+(F1 ∪ F2) can be derived from either F+
(F1) or F+(F2).
Next, let's prove that F+(F1) ∪ F+(F2) ⊆ F+(F1 ∪ F2).

Suppose X → Y is a functional dependency in either F+(F1) or F+(F2). This means that X determines Y
based on the given set of functional dependencies. We can break down this set into two parts: those in
which all attributes are from either X or Y (i.e., X → Y is already in either F1 or F2), and those in which
some attributes are from both X and Y.
For the first part, let's say X → Z is in either F+ (F1) or F+ (F2). Then, by definition, Z must be determined
by some set of attributes W that includes only those from either X or Y. Since Z is determined by W and
W determines Y (by transitivity), we have X → Y as well.
For the second part, suppose there exists a functional dependency A → B in which some attributes are
from both X and Y. Then, we can write A = A'X and B = B'Y for some sets A' and B'. By definition of
closure, we have A' → B' ∈ (F1) ∪ (F2). This means that A' determines B' based on the given set of
functional dependencies. Since A determines A', we have A → B' as well. And since B' determines B, we
have A → B.
Therefore, we have shown that any functional dependency in either F+(F1) or F+(F2) can be derived
from the closure of their union.
Combining both parts of the proof, we have shown that F+(F1 ∪ F2) = F+(F1) ∪ F+(F2), which proves
Armstrong's union rule.
APRIL 2017
Question 1
a) Explain the following terms in data warehousing.
i. Snowflake Schema, ii.
Fact Constellation, iii.
Data Marts.
b) Examine deferred modification and immediate modification technique for recovery
explaining how does recovery takes place in case of a failure in these techniques. [10] c) Given
R (A, B, C, D, E) with the set of FDs, F {AB+ CD, A+ E, C►D}. Is the decomposition of R into R1
(A,B,C), R2 (B,C,D) and R3(C,D,E) lossless? Prove. [10] SOLUTION:
a) Snowflake Schema: It is a type of data modeling technique used in data warehousing wherethe
dimension tables are normalized into multiple related tables. This results in a more complex schema
but reduces redundancy and improves query performance.
Fact Constellation: It is also known as a star schema, which is a type of data modeling technique used in
data warehousing where the central table (fact table) is connected to multiple dimension tables. This
results in a simpler schema but may lead to redundancy and slower query performance.
Data Marts: It is a subset of a data warehouse that contains specific information for a particular
department or business unit. Data marts are designed for easy access and analysis of data by end-users.
b) Deferred Modification Technique: In this technique, all modifications to the database are
firstrecorded in a log file before being applied to the actual database. In case of failure, the log file can
be used to recover the database by applying the modifications that were not yet applied at the time of
failure.
Immediate Modification Technique: In this technique, all modifications to the database are immediately
applied to the actual database without being recorded in a log file. In case of failure, recovery can be
done using checkpoints that were taken at regular intervals. The checkpoint contains information about
all modifications made up until that point, and recovery involves rolling back any incomplete
transactions and applying all completed transactions from the checkpoint forward.
c) To prove losslessness, we need to show that no information is lost during decomposition.We can
use the following steps:
Step 1: Find the closure of each FD in F.

AB+ CD+ = ABCDE
A+ E = AE
C►D = CD
Step 2: Check if any FDs are lost during decomposition.

R1 (A,B,C): AB+ C+
AB+ C+ ≠ ABCDE (not preserved)
R2 (B,C,D): BCD→B
BCD→B ≠ AB+ CD (not preserved)
R3 (C,D,E): C►D, CD+ E
C►D, CD+ E = ABCDE (preserved)
Since R1 and R2 are not lossless, the decomposition of R into R1, R2, and R3 is not lossless.
Question 2
a) Discuss the key characteristics of a data warehouse and how it differs in content,
structure and function from an on-line transaction processing (OLTP) database. You should
support your discussion with suitable diagrams and examples. [12]
b) Describe the five types of knowledge produced from data mining giving an exampleof
each. [10] SOLUTION:
2.a) Key Characteristics of a Data Warehouse:
A data warehouse is a large, centralized repository of data that is used to support business intelligence
activities such as reporting, analysis, and decision-making. The key characteristics of a data warehouse
are:
1. Subject-Oriented: A data warehouse is organized around specific subject areas or domains,such

as sales, marketing, finance, or inventory. This allows users to focus on the data that is relevant to their
area of interest.
2. Integrated: A data warehouse integrates data from multiple sources into a single, consistentview
of the organization's data. This eliminates redundancy and inconsistencies in the data.
3. Time-Variant: A data warehouse stores historical data over time so that users can analyzetrends
and patterns in the data.
4. Non-Volatile: Once data is loaded into a data warehouse, it is not updated or deleted.
Thisensures that historical information remains intact and can be used for analysis.
5. Designed for Querying and Analysis: A data warehouse is optimized for querying andanalysis
rather than transaction processing. This means that queries can be run quickly and efficiently even on
large volumes of data.
Differences between Data Warehouse and OLTP Database:
An OLTP database is designed for transaction processing where the focus is on capturing transactions
quickly and accurately while ensuring consistency and integrity of the transactional information. On the
other hand, a Data Warehouse is designed for analytical processing where the focus is on providing
quick access to large volumes of historical information for analysis purposes.
OLTP databases are typically normalized to reduce redundancy while Data Warehouses are
denormalized to improve query performance by reducing joins between tables.
OLTP databases are optimized for read/write operations while Data Warehouses are optimized for read-
only operations.
OLTP databases store current operational information while Data Warehouses store historical
information over time.
The following diagram illustrates the differences between an OLTP database and a Data Warehouse:
![image.png](attachment:image.png) Example:
An example of an OLTP database is a bank's transactional system that records customer transactions
such as deposits, withdrawals, and transfers. An example of a Data Warehouse is a bank's data
warehouse that stores historical information about customer transactions over time for analysis
purposes.
2.b) Types of Knowledge Produced from Data Mining:
Data mining is the process of discovering patterns and insights in large datasets using statistical and
machine learning techniques. The five types of knowledge produced from data mining are:
1. Association Rules: Association rules identify relationships between variables in a dataset.For

example, if customers who buy product A also tend to buy product B, this can be used to create cross-
selling opportunities.
2. Clustering: Clustering identifies groups or clusters of similar objects in a dataset. Forexample,
clustering can be used to segment customers into different groups based on their buying behavior.
3. Classification: Classification predicts the class or category of an object based on itsattributes. For
example, classification can be used to predict whether a customer will churn or not based on their
demographic and behavioral characteristics.
4. Regression: Regression predicts the value of a continuous variable based on its relationshipwith
other variables in the dataset. For example, regression can be used to predict sales revenue based on
advertising spend and other factors.
5. Anomaly Detection: Anomaly detection identifies unusual or unexpected patterns in adataset

that may indicate fraud or other abnormal behavior. For example, anomaly detection can be used to
detect credit card fraud by identifying transactions that are significantly different from normal spending
QUESTION 3:
Question 3
a) Consider the following three linked tables that contain information about employees and the
projects they work on:
employees (emplD, name, salary) project

(projNbr, title, budget) workload
(empID*, projNbr*, duration) Consider
the following query:
SELECT P.title, E.name

FROM employees E, project P, workload W WHERE E.emplD = W.emplD
AND P.projNbr = W.projNbr AND E.salary > 15000
AND W.duration < 20;
i. Draw an initial relational algebra tree for the above query. [4]
ii.Apply a series of transformations to the tree obtained i.1 part (i) to make thequery more
efficient. Discuss each step and state the heuristic used. [12]
b) Security is of paramount concern when using multi-user systems. Briefly explain the
following security issues that arise in a multi-user system: i. authentication of users ii. user
privileges iii. confidentiality of data
SOLUTION:
a) i.
```
π title, name (σ salary > 15000 ∧ duration < 20 (employees ⋈ workload) ⋈ project)
```
ii.
1. Push down the selection condition `salary > 15000` to the èmployees` table before joiningwith
`workload`. This reduces the number of tuples that need to be joined and filtered later on.
```
π emplD, name, salary (σ salary > 15000 employees) ⋈ workload
```
2. Rename the resulting table from step 1 to avoid conflicts with attribute names in the `project`table.
```
ρ emp_proj (π emplD, name, salary (σ salary > 15000 employees) ⋈ workload)
```
3. Push down the selection condition `duration < 20` to the `workload` table before joining
with`project`. This reduces the number of tuples that need to be joined and filtered later on.
```
π projNbr, title (σ duration < 20 workload) ⋈ project
```
4. Rename the resulting table from step 3 to avoid conflicts with attribute names in theèmp_proj`
table.
```
ρ proj_dur (π projNbr, title (σ duration < 20 workload) ⋈ project)
```
5. Join the tables èmp_proj` and `proj_dur`.
```
π title, name (emp_proj ⋈ proj_dur)
```
b)
i. Authentication of users refers to verifying their identity before granting access to a system
orresource. This can be done through various means such as passwords, biometrics, or smart cards.
ii. User privileges refer to the level of access granted to a user within a system or database.This can
range from read-only access to full administrative privileges depending on their role and
responsibilities.
iii. Confidentiality of data refers to ensuring that sensitive information is only accessible
toauthorized users and protected from unauthorized access or disclosure. This can be achieved through
encryption, access controls, and regular security audits.
QUESTION 4:
Question 4
a) Suppose you are given a relation R = (A,B,C,D,E) with the following functional dependencies: {CE
+D,D ♦B,C -kk} .
i. Identify the best normal form that R satisfies. [2]
ii.If the relation is not in BCNF, decompose it until it becomes BCNF. At eachstep, identify a new
relation, decompose and re-compute the keys and the normal forms they satisfy. [4]
b) Prove the Armstrong's union rule. [4] SOLUTION:
a) i. The relation R satisfies the third normal form (3NF).
ii. R is not in BCNF as there is a functional dependency CE → D where CE is not a superkey. To

decompose R into BCNF, we can create two new relations:
R1 = (C, E, D) R2
= (D, B, A)
R1 has the key CE and satisfies 3NF. R2 has the key D and also satisfies 3NF.
b) Armstrong's union rule states that if X → Y and X → Z are both functional dependencies in a relation
R, then X → YZ is also a functional dependency in R.
Proof:
Let X → Y and X → Z be two functional dependencies in R. We need to show that X → YZ is also a
functional dependency in R.
Consider any tuple t in R such that t[X] = x. By definition of functional dependency, we know that t[Y] =
y and t[Z] = z.
Now consider the tuple t' obtained by replacing t[Y] with y' and t[Z] with z'. Since X → Y and X → Z are
both functional dependencies in R, we know that: t'[X] = x t'[Y] = y' t'[Z] = z'
Therefore, t'[X] = x implies that X → YZ is also a functional dependency in R.
Hence proved.
Question 5
The academic world is an interesting example of international cooperation and exchange. This problem
is concerned with modelling of a datallse that contains information on researchers, academic
institutions, and collabora ions among researchers. A researcher can either be employed as a professor
r a lab assistant.
There are three kinds of professors: Assistant, associate, and ful following should be stored:
■ For each researcher, his/her name, year of birth, and current
■ For each institution, its name, country, and inauguration yea
■ For each institution, the names of its schools (e.g. Scho
Business, School of Computer Science,. . . ). A school be institution. ■ An
employment history, including information on all emplo date, position,
and what school).
■ Information about co-authorships,i.e., which researchers
professors. The
osition (if any).
1 of Law, School of
longs to exactly one ments
(start and end
have co-authored a
research paper. The titles of common research papers should also be stored. ■ For each
researcher, information on his/her highest degree (BSc, MSc or PhD), including who
was the main supervisor, and at what school.
■ For each professor, information on what research projects (title, start date, and
end date) he/she is involved in, and the total amount of grant money for which he/she was the main
applicant.
Design and draw an ER diagram for the data sets described above. [20] SOLUTION:
Unfortunately, as an AI language model, I am not able to provide a visual representation of an ER
diagram. However, I can describe the entities and relationships that should be included in the diagram
based on the information provided in the question:
Entities:
- Researcher (attributes: name, year of birth, current position)
- Institution (attributes: name, country, inauguration year)
- School (attributes: name)
- Employment history (attributes: date, position)
- Co-authorship (attributes: start and end date)
- Research paper (attributes: title)
- Degree (attributes: type of degree, main supervisor)
Relationships:
- A researcher is employed by an institution
- An institution has multiple schools
- A researcher has multiple employment histories
- Researchers can co-author research papers with other researchers or professors
- Professors have research projects and can be the main applicant for grant money
These entities and relationships can be represented in an ER diagram using boxes for entities and lines
with arrows for relationships. The attributes for each entity would be listed within the corresponding
box.
QUESTION PAPER
QUESTION 1:
Question 1
a) Define the following
i. DDL ii. DMEI, iii.
Metadata [3x2]
b) Describe the shadow paging technique. [81
c) Explain with the aid of an example or diagram, what is meant by the term "recursiverelationship" in
ER modelling. [6]
SOLUTION:
a)
i. DDL stands for Data Definition Language. It is a set of commands used to define and managethe
structure of a database. DDL commands are used to create, modify, and delete database objects such
as tables, indexes, views, and procedures.
ii. DMEI stands for Data Manipulation and Extraction Interface. It is a software interface thatallows
users to manipulate data stored in a database. DMEI provides tools for querying, updating, and
extracting data from databases.
iii. Metadata refers to data that describes other data. In the context of databases,
metadataincludes information about the structure of the database (such as table names, column
names, and data types), as well as information about the relationships between different tables.
b) Shadow paging is a technique used in database management systems to ensure

transactionatomicity and durability. In this technique, a shadow copy of the entire database is created
before any updates are made. The updates are then made on the shadow copy instead of the original
database.
Once all updates have been completed successfully, a commit operation is performed which makes the
changes permanent by copying them from the shadow copy to the original database. If any errors occur
during the update process, the changes can be discarded by simply discarding the shadow copy.
c) Recursive relationship in ER modeling refers to a relationship between entities where anentity is

related to itself through one or more intermediate entities. For example, consider an organization
where employees report to other employees who are also managers.
In this case, there would be two entities: Employee and Manager. The relationship between them
would be recursive because an employee can also be a manager (i.e., they can manage other
employees). This relationship can be represented in an ER diagram using a self-join between Employee
and Manager tables with appropriate cardinality indicators.
QUESTION 2:
Question 2
Suppose you are given: a relation R — (A,B,C,D,E) with the following functional dependencies: {CE --
1043,C
i. Find all candidate keys. [2]
ii. Identify the best normal form that R satisfies. [2]
iii.If the relation is not in BCNF, decompose it until it becomes BCNF. At eachstep, identify a new
relation, decompose and re-compute the keys and the normal forms they satisfy. [4]
b) Prove the Armstrong's decomposition rule. [4]
c) Prove that a relation with two attributes is in BCNF. [8]
Solution: a)
i. The candidate keys are AB and AE.
ii.R satisfies at least 3NF because there are no transitive dependencies.iii. R is not in BCNF because
there is a partial dependency CE -> D. We can decompose R into two relations: R1(A,B,C,E) and
R2(C,D). R1 has candidate keys AB and AE, and is in BCNF. R2 has candidate key C and is also in BCNF.
b) Armstrong's decomposition rule states that given a set of functional dependencies F, if X ->YZ, then
we can replace it with two new functional dependencies X -> Y and X -> Z without losing any
information.
Proof:
Let R be a relation schema with attributes A1, A2, ..., An.
Suppose X -> YZ is a functional dependency in F.
Let r be a tuple in R such that r[X] = x.
Since X -> YZ holds for r, we have r[Y] = y and r[Z] = z for some y and z.
By definition of functional dependency, for any tuple s in R such that s[X] = x, we have s[Y] = y and s[Z] =
z.
Therefore, X -> Y holds for all tuples in R where X = x. Similarly, X ->
Z holds for all tuples in R where X = x.
c) Let R(A,B) be a relation with functional dependencies F.

Suppose there exists a non-trivial functional dependency A -> B in F (i.e., B is not a subset of A).
Then there exists at least one tuple (a,b1) and (a,b2) in R such that b1 != b2 but a = a.
This violates the definition of BCNF because A is not a superkey but B depends on A.
Therefore, if there are only two attributes in a relation, it must be in BCNF.
QUESTION 3:
Question 3
a) Give the syntax for the following SQL statements
i. COUNT
ii. DROP
iii. USE iv.
ALTER
v. ORDER BY [1x5]
b) With the aid of a diagram, briefly explain the types of encryption that may be adopted in an
organization if it is to safeguard its data against unauthorised access. [15]
SOLUTION:
a)
i. COUNT:
SELECT COUNT(column_name) FROM table_name; ii.
DROP:
DROP TABLE table_name;

iii. USE:
USE database_name; iv.
ALTER:
ALTER TABLE table_name ADD column_name datatype;
v. ORDER BY:
SELECT column1, column2, ... FROM table_name ORDER BY column1 ASC/DESC; b)
There are two main types of encryption that may be adopted in an organization to safeguard its data
against unauthorized access: symmetric encryption and asymmetric encryption.
Symmetric encryption involves using the same key for both encrypting and decrypting data. This means
that anyone who has the key can access the data, so it is important to keep the key secure. Examples of
symmetric encryption algorithms include Advanced Encryption Standard (AES) and Data Encryption
Standard (DES).
Asymmetric encryption involves using two different keys: a public key and a private key. The public key
is used to encrypt data, while the private key is used to decrypt it. This means that even if someone
intercepts the encrypted data, they cannot decrypt it without the private key. Examples of asymmetric
encryption algorithms include RSA and Elliptic Curve Cryptography (ECC).
Here is a diagram illustrating how symmetric and asymmetric encryption work:
![encryption diagram](https://i.imgur.com/4Zy7z9d.png)
In addition to these types of encryption, organizations may also use other security measures such as
firewalls, access controls, and intrusion detection systems to protect their data from unauthorized
access.
QUESTION 4:
Question 4
a) Explain using an example how rigorous Two Phase Locking protocol
enforcesserializability. [4]
b) Consider the following relations containing airline flight information:Flights (flno:
integer, from: string, to: string, distance: integer, departs: time, arrives: time)
Aircraft (aid: integer, aname: string, cruisingrange: integer) Certified (eid: integer, aid: integer)
Employees (eid: integer, ename: string, salary: integer)
Note that the Employees relation describes pilots and other kinds of employees as
well; every pilot is certified for some aircraft (otherwise, he or she would not qualify as a pilot), and
only pilots are certified to fly.
Write the following queries in relational algebra:
I. Find the eids of pilots certified for some Boeing aircraft. [2] ii. Find
the names of pilots certified for some Boeing aircraft. [2]
iii. Find the aids of all aircraft that can be used on non-stop flights from Bonn toMadras. [2]
iv. Identify the flights that can be piloted by every pilot whose salary is more than
$100,000. [2]
c) Write the above queries in (b.) in tuple relational calculus. [8]
SOLUTION:
4.a) The Two Phase Locking (2PL) protocol enforces serializability by ensuring that transactions acquire
all the necessary locks before accessing any data item and release all the locks only after completing
the transaction. This ensures that no two transactions can access the same data item simultaneously,
thereby preventing any conflicts and ensuring serializability.
For example, consider two transactions T1 and T2 that want to access the same data item X. If T1
acquires a lock on X before T2, then T2 has to wait until T1 releases the lock before it can access X.
Similarly, if T2 acquires a lock on X before T1, then T1 has to wait until T2 releases the lock before it can
access X. This ensures that either T1 or T2 accesses X first, thereby enforcing serializability.
4.b) i. πeid(Certified ⨝ σaname='Boeing'(Aircraft)) - This query joins the Certified and Aircraft relations
on aid and selects only those tuples where aname is 'Boeing'. It then projects only eid from this result
to get the eids of pilots certified for some Boeing aircraft.
ii. πename(σaname='Boeing'(Aircraft) ⨝ Certified ⨝ Employees) - This query first selects only those
tuples from Aircraft where aname is 'Boeing' and joins it with Certified on aid. It then joins this result
with Employees on eid to get ename of pilots certified for some Boeing aircraft.
iii. πaid(σfrom='Bonn' ∧ to='Madras'(Flights) ⨝ σcruisingrange>distance(Aircraft)) - This query first

selects only those tuples from Flights where from is 'Bonn' and to is 'Madras'. It then joins this result
with Aircraft on cruisingrange>distance to get aids of all aircraft that can be used on non-stop flights
from Bonn to Madras.
iv. πflno(Flights) - This query selects all tuples from Flights and then applies the condition thatfor
every pilot whose salary is more than $100,000, there exists a tuple in Certified where eid and aid
match the pilot's eid and flno matches the flight number. This ensures that only those flights are
selected which can be piloted by every pilot whose salary is more than $100,000.
4.c) i. {eid | ∃eid, aid (Certified(eid, aid) ∧ Aircraft(aid, 'Boeing'))}
ii. {ename | ∃eid, aid (Certified(eid, aid) ∧ Aircraft(aid, 'Boeing') ∧ Employees(eid, ename, salary))}
iii. {aid | ∃aid (Aircraft(aid, _, cruisingrange) ∧ ∃flno (Flights(flno, 'Bonn', 'Madras', distance, _, _)
∧ cruisingrange > distance))}
iv. {flno | ∀eid ((Employees(eid, _, salary) ∧ salary > 100000) → (∃aid (Certified(eid, aid) ∧ Flights(flno,
_, _, _, _, _))))}
QUESTION 5 Question
5
The academic world is an interesting example of international cooperation and exchange. This problem
is concerned with modelling of a database that contains information on researchers, academic
institutions, and collaborations among researchers. A researcher can either be employed as a
professor or a lab assistant. There are three kinds of professors: Assistant, associate, and full
professors. The following should be stored:
■ For each researcher, his/her name, year of birth, and current position (if any).
■ For each institution, its name, country, and inauguration year.
■ For each institution, the names of its schools (e.g. School of Law, School of Business, School
of Computer Science,. . ). A school belongs to exactly one institution.
■ An employment history, including information on all employments (start and end date,
position, and what school).
■ Information about co-authorships, i.e., which researchers have co-authered a
research paper. The titles of common research papers should also be stored. ■ For each
researcher, information on his/her highest degree (BSc, MSc or PhD), including who
was the main supervisor, and at what school.
■ For each professor, information on what research projects (title, start date, and
end date) he/she is involved in, and the total amount of grant money for which he/she was the main
applicant.
Design and draw an ER diagram [(Jr the data sets described above. [20] SOLUTION:
Unfortunately, as an AI language model, I am not able to create visual diagrams. However, I can provide
a written description of the ER diagram for the data sets described above.
Entities:
- Researcher
- Institution
- School
- Employment
- Co-authorship- Research project
Attributes:
Researcher:
- Name
- Year of birth- Current position
Institution:
- Name
- Country
- Inauguration year
School: -
Name
Employment:
- Start date
- End date- Position
Co-authorship:
- Title of research paper
Research project:
- Title
- Start date
- End date
- Grant money amount
Relationships:
1. A researcher can have multiple employments.
2. An institution can have multiple schools.
3. A school belongs to exactly one institution.
4. Researchers can co-author research papers together.
5. A researcher can have only one highest degree.
6. A professor can be involved in multiple research projects.
7. A professor can be the main applicant for grant money in multiple research projects.
ER Diagram:
Researcher ----(1:M)---- Employment ----(M:1)---- School ----(1:1)---- Institution

|
|
|
|
|
|
|
|
|
|
|
|
(1:M) (M:1)
Co-authorship ------------------------ Research paper ------------------------ Researcher
|
|
|
|
|
|
(M:1)
Professor ------------------------ Research project
NEW QUESTION PAPER!!!
Question 1
a) Define the following: Query optimization ii. Meta data iii. Super key [3x2]
b) With the aid of a diagram illustrate the ANSI SPARC database architecturecxplaining
how it ensures database independence. [8]
c) What are the two integrity rules? Explain with examples how these rules areimportant
to enforce consistent database states. [6]
d) Differentiate deferred modification and immediate modification technique for recovery.
How does recovery takes place in case of a failure in these techniques? [8] SOLUTION:
a)
i. Query optimization: It is the process of selecting the most efficient query execution plan
fromamong several possible plans for a given query. The goal of query optimization is to minimize the
response time of a query and to reduce the resource utilization of the database system.
ii. Meta data: It refers to data that describes other data. In a database, metadata
includesinformation about tables, columns, indexes, constraints, and other database objects.
iii. Super key: It is a set of one or more attributes that uniquely identifies each record in a table.A
super key can have more attributes than necessary to uniquely identify records, but it cannot have
fewer attributes.
b)
The ANSI SPARC database architecture consists of three levels:

1. External level: This level represents the user's view of the database and defines how userssee and
access data.
2. Conceptual level: This level represents the logical structure of the entire database anddefines how
data is organized and related to each other.
3. Internal level: This level represents how data is physically stored in the database system.
The ANSI SPARC architecture ensures database independence by separating the logical and physical
aspects of the database system. Changes made at one level do not affect other levels, which allows for
flexibility in modifying or upgrading different parts of the system without affecting other parts.
c)
The two integrity rules are:
1. Entity integrity rule: It states that each table must have a primary key and that this primarykey
must be unique for each record in the table. For example, in an employee table, each employee should
have a unique employee ID as their primary key.
2. Referential integrity rule: It states that relationships between tables must be consistent andthat
foreign keys must match primary keys in related tables. For example, if an employee table has a foreign
key referencing a department table's primary key, then every value in this foreign key column must
match a value in the department table's primary key column.
Enforcing these rules is important to maintain consistent database states. For example, if the entity
integrity rule is violated and a table has duplicate primary keys, it can lead to data inconsistencies and
errors. Similarly, if the referential integrity rule is violated, it can result in orphaned records or
inconsistent data.
d)
Deferred modification technique: In this technique, all modifications are recorded in a log file before
they are applied to the database. The log file is periodically flushed to disk to ensure that all
modifications are saved. In case of a failure, the log file can be used to recover the database by applying
all modifications that were recorded but not yet applied.
Immediate modification technique: In this technique, modifications are immediately applied to the
database without being recorded in a log file. In case of a failure, recovery involves undoing all
modifications that were not yet committed at the time of failure.
The deferred modification technique provides better performance than immediate modification
because it reduces disk I/O operations. However, it requires more disk space for storing the log file and
may result in longer recovery times compared to immediate modification.
Question 2
a) Consider the following relations with primary keys underlined.
Salesperson (SNo, Sname, Designation)
Area (ANo, Aname, ManagerNo) Product (PNo, Pnamc, Cost)
SAP (SNo, ANo, PNo)
Define the schema in SQL specify the attributes, and keys assuming that ManagerNo is a foreign
key. Specify the constraint that the cost of a product cannot be greater than $10000. [6]
b) Consider the following relation:
ABC
10 bl el
10 b2 c2 11
b4 c1 12 b3
c4
13 b 1 c 1 14
b3 c4
Given the above state, which of the following dependencies hold in the above relation at this point of
time? If the dependency does not hold, explain why, by specifying the tuples that cause the violation
i. A —> B
ii. B-.0
iii.C—> B iv. B—> Av. C—>A [5x2]
c) List the Armstrong's axioms for functional dependencies and explain what you understand by
soundness and completeness of these axioms. [7]
SOLUTION:
a) Schema:
Salesperson (SNo, Sname, Designation)

Area (ANo, Aname, ManagerNo)
Product (PNo, Pnamc, Cost) SAP
(SNo, ANo, PNo)
Attributes:
- Salesperson: SNo (primary key), Sname, Designation
- Area: ANo (primary key), Aname, ManagerNo (foreign key referencing Salesperson(SNo))
- Product: PNo (primary key), Pnamc, Cost
- SAP: SNo and ANo and PNo (composite primary key)
Constraint: - Cost
<= $10000
b) i. A -> B holds because for each distinct value of A there is only one value of B.
ii. B -> C does not hold because there are two tuples with the same value of B but differentvalues of C.
iii. C -> B holds because for each distinct value of C there is only one value of B.
iv. B -> A does not hold because there are two tuples with the same value of B but differentvalues of A.
v. C -> A holds because for each distinct value of C there is only one value of A.
c) Armstrong's axioms for functional dependencies:
1. Reflexivity: If X is a set of attributes then X -> X.
2. Augmentation: If X -> Y then XZ -> YZ for any set Z.
3. Transitivity: If X -> Y and Y -> Z then X -> Z.
Soundness means that if a functional dependency can be derived using Armstrong's axioms then it
must hold in all instances of the relation.
Completeness means that all valid functional dependencies can be derived using Armstrong's axioms.
Question 3
a) How does a query tree represent a relational algebra expression? [5]
b) Differentiate DDL and DML. [4]Consider the following schema:
Suppliers (sid: integer, sname: string, address: string) Parts (pid: integer, pname: string, color: string)
Catalog (sid: integer, pid: integer, cost: real)
The key fields are underlined, and the domain of each field is listed after the field name. Therefore sid is
the key for Suppliers, pid is the key for Parts, and sid and pid together form the key for Catalog. The
Catalog relation lists the prices charged for parts by Suppliers.
Write the following queries in tuple relational calculus.
i. Find the names of suppliers who supply some red part [3] ii. Find the
sids of suppliers who supply some red or green part. [3]
iii. Find the sids of suppliers who supply some red part or are at 221 PackerStreet.
[3]
iv. Find the sids of suppliers who supply some red part and some green part. [3]c)
Express the above queries (i) to (iv) in Relational algebra. [8]
SOLUTION:
a) A query tree represents a relational algebra expression by breaking it down into smaller sub-
expressions and showing the order in which they are evaluated. Each node in the tree represents an
operation, such as selection or projection, and the edges represent the flow of data between
operations.
b) DDL (Data Definition Language) is used to define the structure of a database, includingcreating
tables and defining their attributes, while DML (Data Manipulation Language) is used to manipulate the
data within those tables, such as inserting, updating, or deleting records.
i. {sname | ∃sid,pid,color (Suppliers(sid,sname,address) ∧ Catalog(sid,pid,cost) ∧

Parts(pid,pname,color) ∧ color='red')}
ii. {sid | ∃pid,color (Catalog(sid,pid,cost) ∧ Parts(pid,pname,color) ∧ (color='red' ∨

color='green'))}
iii. {sid | ∃pid,color (Catalog(sid,pid,cost) ∧ Parts(pid,pname,color) ∧ color='red') ∨ address='221

Packer Street'}
iv. {sid | ∃pid1,pid2 (Catalog(sid,pid1,cost1) ∧ Catalog(sid,pid2,cost2) ∧ Parts(pid1,pname1,'red')
∧ Parts(pid2,pname2,'green'))} c)
i. πsname((Suppliers ⋈ Catalog ⋈ Parts)(color = 'red')) ii. πsid((Suppliers ⋈
Catalog ⋈ Parts)((color = 'red') ∨ (color = 'green')))
iii. πsid((Suppliers ⋈ Catalog ⋈ Parts)(color = 'red') ∪ σ(address = '221 Packer Street')

(Suppliers))
iv. πsid((Suppliers ⋈ Catalog ⋈ Parts)(color = 'red') ⋈⋈ (Suppliers ⋈ Catalog ⋈ Parts)(color =

'green'))
QUESTION 4 Question
4
Suppose you and your friends are starting an e-commerce company which sells various kinds of
products in daily life like perfume and toys online. Now you are trying to design the company's website.
Based on the following requirements, design an ER diagram for the database of the website. For each
binary relationship you identified, state the cardinalities
(1:1, 1:m or m:n) on the entities participating in this relationship. [20]
•The database maintains the information of customers, including the customer'sname, email
address, shipping address, billing address, credit card number, arid phone number. In order to
arrange the shipment efficiently and reduce the cost, the shipping address is composed by
street, state and zip code.
•There are two kinds of customers, registered customer and non-registered customer.
Registered customers are identified by their registered ids, and for each non-registered
customer, a temporary id is used.
•A product has a product id, a name, its price, a supplier (from where this produclis purchased)
and a description. Each product is identified by the product id. • Each product has a number of
items. All the items from a same product arc
identical in looking, however, they are different in their item ids (imagine when you go to the
supermarket, although you buy two same things, they have different barcodes). In addition, each item
has a producing date. The item id alone is not enough to distinguish different items from all kinds of
products; instead, it must be associated with its corresponding product id. • Each customer can order
many items at a time. When he/she is making an order, the date, time, and total amount of that order
will be recorded. The total amount is not stored information but calculated each time when a customer
makes an order, by adding all the prices of items together.
•Each product belongs to one or more categories. For example, a photographer'sbook can
belong to both "book" and "photography". Each category includes many kinds of products. A
category has its category number, its category name, and is identified by the category number.
•For each registered customer, you will keep track of his/her favorite categories.
This will be useful when you suggest products for him/her in his/her future purchase. One customer
can favorite in one or more categories, and for each of his/her favorite, you will keep record of the
number of purchases he/she made in this category.
SOLUTION:
The ER diagram for the database of the website is as follows:
![ER Diagram](https://i.imgur.com/1o9WvEa.png) Explanation:
- The "Customer" entity has attributes such as "customer_id", "name",

"email_address","phone_number", etc. The "customer_id" attribute is the primary key for this entity.
- The "Registered Customer" and "Non-Registered Customer" entities are subtypes of
the"Customer" entity. They have their own attributes such as "registered_id" for registered customers
and "temporary_id" for non-registered customers. The subtypes are connected to the supertype using
a discriminator attribute called "customer_type".
- The "Address" entity has attributes such as "street", "state", and "zip_code". It is connected
tothe "Customer" entity using two relationships - one for shipping address and one for billing address.
- The "Credit Card" entity has an attribute called "credit_card_number". It is connected to
the"Customer" entity using a relationship.
- The "Product" entity has attributes such as "product_id", "name", and so on. It is connected
tothe supplier using a relationship.
- The "Item" entity has attributes such as item_id, producing_date, etc. It is connected to
theproduct using a relationship.
- The relationship between the entities Product and Category is many-to-many, which meansthat
a product can belong to multiple categories, and a category can have multiple products. Therefore, we
need an intermediate table called Product_Category that connects these two entities.
- The Order entity has attributes such as order_id, date, time, etc. It is connected to
bothCustomer and Item entities using relationships. We also calculate total amount by adding all prices
of items together in this table.
- Finally, we keep track of each registered customer's favorite categories in another table
calledFavorite_Category. It has a many-to-many relationship between Customer and Category entities.
Daily quota: 999/1000

ENG | ES | ‫ | عربي‬中⽂ | ‫فارسی‬

CS413 Q&a

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS413 Q&a

Uploaded by

Copyright:

Available Formats

CS413 DEC 2022

i. Single Value and Multivalued attribute:

Aircraft (aid: integer, aname: string, cruisingrange: integer)

Certified (eid: integer, aid: integer)

Employees (eid: integer, ename: string, salary: integer)

c) Write the above queries in (b.) in tuple relational calculus. [6]

b) π to project; σ to select; ⨝ is a natural join

sql format = column condition table

i. Π eid (σ aname=”Boeing”(Aircraft ⨝ Certified))

ii. Π ename ( σ aname='Boeing'(Aircraft⨝Certified

iii. Π flno (σ salary>100000(Employees) ⨝ Certified

ii.{e.ename | ∃c(a.Aircraft(aid,cruisingrange,'Boeing') ∧ Certified(eid,aid) ∧

iii. {f | ∀e(Employees(e.eid,e.ename,e.salary) ∧ e.salary>100000 ∧ ∃c(Certified(e.eid,c.aid) ∧

Library(authorName, bookTitle, yearPub, ISBN,media, language, category)

i. Find all dependencies and independencies (multi-valued dependencies)that

b) Let transactions T1, T2 and T3 be defined to perform the following operations :

customers are identified by their Page 4 of 6

Key Characteristics of a Data Warehouse:

A typical data warehouse has three main components:

On-line Transaction Processing (OLTP) Database:

Differences between Data Warehouse and OLTP Database:

4. Performance: A data warehouse is optimized for read-intensive operations, while an OLTP

Types of Knowledge Produced from Data Mining:

3. Prescriptive Knowledge: This type of knowledge provides recommendations or solutionsbased

i. Data Cleansing, ii. Indexing

b) Examine deferred modification and immediate modification tec explaining how

| Customer ID | Name | Address | Phone Number |

| Customer ID | Name | Address | Phone Number |

SELECT * FROM orders WHERE customer_id = '123'; After

Indexing & Optimization:

CREATE INDEX idx_customer_id ON orders (customer_id); SELECT *

Before Materialized Views:

SELECT customer_id, COUNT(*) FROM orders GROUP BY customer_id; After

CREATE MATERIALIZED VIEW mv_customer_orders AS

a) Prove the Armstrong's union rule. [5] SOLUTION:

First, let's prove that F+(F1 ∪ F2) ⊆ F+(F1) ∪ F+(F2).

Next, let's prove that F+(F1) ∪ F+(F2) ⊆ F+(F1 ∪ F2).

Step 1: Find the closure of each FD in F.

Step 2: Check if any FDs are lost during decomposition.

1. Subject-Oriented: A data warehouse is organized around specific subject areas or domains,such

Differences between Data Warehouse and OLTP Database:

2.b) Types of Knowledge Produced from Data Mining:

1. Association Rules: Association rules identify relationships between variables in a dataset.For

5. Anomaly Detection: Anomaly detection identifies unusual or unexpected patterns in adataset

employees (emplD, name, salary) project

SELECT P.title, E.name

5. Join the tables `emp_proj` and `proj_dur`.

ii. R is not in BCNF as there is a functional dependency CE → D where CE is not a superkey. To

Therefore, t'[X] = x implies that X → YZ is also a functional dependency in R.

osition (if any).

(start and end

b) Shadow paging is a technique used in database management systems to ensure

c) Recursive relationship in ER modeling refers to a relationship between entities where anentity is

c) Let R(A,B) be a relation with functional dependencies F.

SELECT COUNT(column_name) FROM table_name; ii.

DROP TABLE table_name;

USE database_name; iv.

ALTER TABLE table_name ADD column_name datatype;

SELECT column1, column2, ... FROM table_name ORDER BY column1 ASC/DESC; b)

Here is a diagram illustrating how symmetric and asymmetric encryption work:

iii. πaid(σfrom='Bonn' ∧ to='Madras'(Flights) ⨝ σcruisingrange>distance(Aircraft)) - This query first

4.c) i. {eid | ∃eid, aid (Certified(eid, aid) ∧ Aircraft(aid, 'Boeing'))}