0% found this document useful (0 votes)

183 views

CSE 444 Practice Problems

Plan B may be faster than Plan A for two reasons: 1. Hash join in Plan B is likely to be more efficient than nested loops join in Plan A. Hash join typically has better performance than nested loops join when the relations are large. 2. Applying selection conditions earlier in Plan B (on A before the join) may reduce the amount of data that needs to be processed in subsequent operations. Applying selection as early as possible improves efficiency by reducing the data that needs to be processed in later operations.

Uploaded by

Yaikob Kebede

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

183 views

CSE 444 Practice Problems

Uploaded by

Yaikob Kebede

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

CSE 444 Practice Problems

DBMS Architecture

1. Data Independence

(a) What is physical data independence?

Solution:
Physical data independence is a property of a DBMS that ensures decoupling between the physical
layout of data and applications which access it. In other words, with physical independence,
applications are insulated from changes in physical storage details: changes in how the data is
stored do not cause application changes.

(b) What properties of the relational model facilitate physical data independence?
Solution:
Declarative query language or set-at-a-time query language.

1
(c) What is logical data independence?
Solution:
Logical data independence is a property of a DBMS that ensures decoupling between the logical
structure of data and applications that operate on it. With this property, changes in the logical
data layout like tables, rows, and columns do not require application to be changed.

(d) How can one provide a high level of logical data independence with the relational model?
Solution:
By defining views.

2
2. High-Level DBMS Architecture
You should know the key components of a relational DBMS. Please see the lecture notes for an overview.
You should also be able to discuss what you implemented in the labs. For example, we could ask you
for a high-level description of your buffer manager.

3
3. Data Storage and Indexing
You should be able to show what happens when one adds/removes data to/from a B+ Tree. Please
see the web quizzes for some good examples.

4
4. Data Storage and Indexing
Suppose we have a relation R(a,b,c,d,e) and there are at least 1000 distinct values for each of the
attributes.
Consider each of the following query workloads, independently of each other. If it is possible to speed it
up significantly by adding up to two additional indexes to relation R, specify for each index (1) which
attribute or set of attributes form the search key of the index, (2) if the index should be clustered or
unclustered, (3) if the index should be a hash-based index or a B+-tree. You may add at most two
new indexes. If adding a new index would not make a significant difference, you should say so. Give a
brief justification for your answers.

(a) 100,000 queries have the form: select * from R where b <?
10,000 queries have the form: select * from R where c =?

Solution:
Since we need efficient range-queries on R(b), we definitely want a clustered, B+-tree index on
R(b).
For the second query, an index on R(c) will help. It can be either a B+-tree or a hash-based index
since queries look-up specific key values. The index must be unclustered since the index on R(b)
is clustered. That is fine, however, since queries will look-up specific key values and we know that
there are many distinct values in the relation.

(b) 100,000 queries have the form: select * from R where b <? and c =?
10,000 queries have the form: select * from R where d =?
1,000 queries have the form: select * from R where a =?

Solution:
For the first query, a clustered B+-tree index on R(c,b) would be most helpful since we could use
it to look-up all data items that match both the given value on c and the range on b.
Since we can only add a second index, we will favor the most frequent query and add an index on
R(d). As in the question above, this index must be unclustered and can be either a B+-tree or a
hash-based index.

5
(c) 100,000 queries have the form: select a, c from R where b <?
10,000 queries have the form: select * from R where d <?

Solution:
Since both queries are range-selection queries, we need clustered indexes for both of them, but we
cannot have more than one such index. However, we can have a covering index.
We thus recommend:
• A clustered, B+-tree index on R(d).
• An unclustered, B+-tree index on R(b,a,c). This is also a covering index in that we only need
to use the index to answer the query. We don’t need to touch the data.

6
5. Relational Algebra and Query Processing
Consider three tables R(a,b,c), S(d,e,f), and T(g,h,i).

(a) Consider the following SQL query:

SELECT R.b
FROM R, S, T
WHERE R.a = S.d
AND S.e = T.g
AND T.h > 21
AND S.f < 50
GROUP BY R.b
HAVING count(*) > 2
For each of the following relational algebra expressions, indicate if it is a correct translation of
the above query or not.
i. πR.b (σTOTAL>2 (γR.b,count(∗)→TOTAL (σT.h>21 AND S.f <50 (R 1R.a=S.d (S 1S.e=T.g T )))))

CORRECT INCORRECT
Solution:
CORRECT.

ii. πR.b (σTOTAL>2 (γR.b,count(∗)→TOTAL (R 1R.a=S.d ((σS.f <50 (S)) 1S.e=T.g (σT.h>21 (T ))))))

CORRECT INCORRECT
Solution:
CORRECT.

iii. πR.b (σTOTAL>2 (σT.h>21 AND S.f <50 (γR.b,count(∗)→TOTAL (R 1R.a=S.d (S 1S.e=T.g T )))))

CORRECT INCORRECT
Solution:
INCORRECT.

7
(b) For the following SQL query, show an equivalent relational algebra expression. You can give the
expression in the same format as we used above or you can draw it in the form of an expression
tree or logical query plan.
SELECT R.b
FROM R, S
WHERE R.a = S.d
AND R.b NOT IN (SELECT R2.b FROM R as R2, T WHERE R2.b = T.g)
Solution:
πR.b (R 1R.a=S.d S) − πR.b (R 1R.b=T.g T )

8
(c) A user just connected to a database server and submitted a SQL query in the form of a string.
Give four important steps involved in evaluating that SQL query and the order in which they
are performed. You only need to name the steps. No need to explain them.
Solution:

(d) Query parsing: the DBMS parses the SQL string into an internal tree representation of the
query. Typically, a variety of checks are also performed at that time including syntax checks,
authorization checks, simple integrity constraint checks, etc.
(e) Query rewrite: the parse-tree is converted into an initial logical query plan. During that step,
views are also rewritten and queries are flattened if possible.
(f) Query optimization: the query optimizer searches for an efficient physical query plan to execute
the query.
(g) Query execution: the DBMS actually executes the query and returns the results to the user.

9
(h) What is the difference between a logical and a physical query plan?

Solution:
A logical query plan is an extended relational algebra tree. A physical query plan is a logical query
plan with extra annotations that specify the (1) access path to use for each relation (whether to use
an index or a file scan and which index to use), the (2) implementation to use for each relational
operator, and (3) how the operators should be executed (pipelined execution, intermediate result
materialization, etc.).

10
6. Relational Algebra and Query Processing
Consider four tables R(a,b,c), S(d,e,f), T(g,h), U(i,j,k).

(a) Consider the following SQL query:

SELECT R.b, avg(U.k) as avg
FROM R, S, T, U
WHERE R.a = S.d
AND S.e = T.g
AND T.h = U.i
AND U.j = 5
AND (R.c + S.f) < 10
GROUP BY R.b
Draw a logical plan for the query. You may chose any plan as long as it is correct (i.e. no need to
worry about efficiency).

Solution:
Many solutions were possible including:

π R.b,avg

γ R.b, avg(U.k) -> avg

σ (U.j = 5) ∧ (R.c+S.f < 10)

T.h = U.i

S.e = T.g
U

R.a = S.d
T

R S

11
(b) Consider the following two physical query plans. Give two reasons why plan B may be faster
than plan A. Explain each reason.

(On the fly) π A.d,B.d (On the fly) π A.d,B.d

(On the fly) σ B.c < 2000

A.b = B.b
(Hash join)

(Index nested loop) σ a>10 σ B.c < 2000

A.b = B.b

(Use B+ tree index) A B

σ a>10 (File scan) (File scan)

A B
(B+ tree index on A.a) (B+ tree index on B.b)

Plan A Plan B

Solution:
The following are three possible reasons why plan B could be faster than plan A:
• The low selectivity of the selection predicate (a > 10) and an unclustered index on A.a may
make a file scan of A faster than using the index.
• If there are lots of matches, hash joins may be faster than index nested loops. The hash join
reads its input only once (unless the input is too large). The index-nested loop may end-up
reading the same pages of B multiple times.
• Pushing the selection (B.c < 2000) down can get rid of lots of B tuples before the join,
reducing the cost of that operation.

12
7. Operator Algorithms
Relation R has 90 pages. Relation S has 80 pages. Explain how a DBMS could efficiently join these
two relations given that only 11 pages can fit in main memory at a time. Your explanation should be
detailed: specify how many pages are allocated in memory and what they are used for; specify what
exactly is written to disk and when.

(a) Present a solution that uses a hash-based algorithm.

Solution:
The algorithm proceeds as follows:
• First, we split R into partitions:
– Allocate one page for the input buffer.
– Allocate 10 pages for the output buffers: one page per partition.
– Read in R one page at the time. Hash into 10 buckets. As the pages of the different
buckets fill-up, write them to disk. Once we process all of R, write remaining incomplete
pages to disk. At the end of this step, we have 10 partitions of R on disk. Assuming
uniform data distribution, each partition comprises 9 pages.
• Then, we split S into partitions the same way we split R (must use same hash function).
• For each pair of partitions that match:
– Allocate one page for input buffer
– Allocate one page for the output buffer.
– Read one 9-page partition of R into memory. Create a hash table for it using a different
hash function than above.
– Read corresponding S partition into memory one page at the time. Probe the hash table
and output matches.

(b) Present a solution that uses a sort-based algorithm.

Solution:
The algorithm proceeds as follows
• First, we sort R:
– Read 10 pages worth of R tuples into memory, sort, and write to disk.
– Repeat for next 10 pages until all R tuples have been processed.
– Now we have 9 runs of 10 pages sorted on disk.
– Allocate one page per run and one page for the output.
– Merge runs in sorted order and output into file.
• Sort S the same way that we sorted R above.
• Read S and R one page at the time. Merge them and output matches.
Note that we can be more efficient if we use a priority queue and output runs of length twice
the size of memory in the first step of the algorithm. Then, we can join R and S together while
merging their individual runs.

Chapter - 1 - Query Optimization
No ratings yet
Chapter - 1 - Query Optimization
38 pages
Final 15
No ratings yet
Final 15
7 pages
Guc 437 59 31055 2023-05-25T16 41 09
No ratings yet
Guc 437 59 31055 2023-05-25T16 41 09
15 pages
Query Optimization
No ratings yet
Query Optimization
103 pages
Chap12 Practice Key
No ratings yet
Chap12 Practice Key
3 pages
SE3060 - Database Systems
No ratings yet
SE3060 - Database Systems
6 pages
Midterm 13w2
No ratings yet
Midterm 13w2
8 pages
QEII
No ratings yet
QEII
44 pages
NR 410210 Database Management System
No ratings yet
NR 410210 Database Management System
8 pages
DBMS
No ratings yet
DBMS
15 pages
Session - 10 Querying
No ratings yet
Session - 10 Querying
36 pages
Final Review
No ratings yet
Final Review
96 pages
HW 3 Sol
No ratings yet
HW 3 Sol
8 pages
CMSPCOR02T Final Question Paper 2022
No ratings yet
CMSPCOR02T Final Question Paper 2022
2 pages
13 QP1
No ratings yet
13 QP1
33 pages
Midterm 15w2
No ratings yet
Midterm 15w2
8 pages
Introduction To Database Management Systems CS470
No ratings yet
Introduction To Database Management Systems CS470
11 pages
Practice Optimizer Blank
No ratings yet
Practice Optimizer Blank
7 pages
CSE 444 Practice Problems
No ratings yet
CSE 444 Practice Problems
8 pages
Query Execution
No ratings yet
Query Execution
87 pages
Cs403 Collection of Old Papers
100% (1)
Cs403 Collection of Old Papers
19 pages
Dbms Lab Manual 2015-16
No ratings yet
Dbms Lab Manual 2015-16
50 pages
Execution
No ratings yet
Execution
37 pages
Hash Tables and Query Execution: March 1st, 2004
No ratings yet
Hash Tables and Query Execution: March 1st, 2004
32 pages
Lec6 QP Indexing
No ratings yet
Lec6 QP Indexing
40 pages
Chapter15 1
No ratings yet
Chapter15 1
43 pages
7-Query Processing
No ratings yet
7-Query Processing
47 pages
m1 M Tech Topics in Database Technology 01cs6103 Dec 2017
No ratings yet
m1 M Tech Topics in Database Technology 01cs6103 Dec 2017
3 pages
Assignment DBMS
No ratings yet
Assignment DBMS
6 pages
Dbmsass 2
No ratings yet
Dbmsass 2
2 pages
dbms - 4th sem
No ratings yet
dbms - 4th sem
2 pages
MidtermPracticeQuestions Solutions
No ratings yet
MidtermPracticeQuestions Solutions
6 pages
COMP3278_Dec2015
No ratings yet
COMP3278_Dec2015
10 pages
An Introduction To Database Systems Bipin C.desaI
No ratings yet
An Introduction To Database Systems Bipin C.desaI
849 pages
Query Evaluation
No ratings yet
Query Evaluation
51 pages
UEU Basis Data Pertemuan 14
No ratings yet
UEU Basis Data Pertemuan 14
32 pages
B.Sc._H_Computer_Sc_RYNrJ3F
No ratings yet
B.Sc._H_Computer_Sc_RYNrJ3F
8 pages
Chapter 2-Query Processing and Optimi
No ratings yet
Chapter 2-Query Processing and Optimi
43 pages
unit 3_DBMS
No ratings yet
unit 3_DBMS
15 pages
Overview of Query Evaluation: R&G Chapter 12
No ratings yet
Overview of Query Evaluation: R&G Chapter 12
30 pages
Dbms Ques & Ans-1
No ratings yet
Dbms Ques & Ans-1
9 pages
Dbms ND 2011 - Cse Tube
No ratings yet
Dbms ND 2011 - Cse Tube
4 pages
Advanced Database Systems Chapter One Query Processing & Optimization
No ratings yet
Advanced Database Systems Chapter One Query Processing & Optimization
22 pages
Reg No.: - Name: - Name
No ratings yet
Reg No.: - Name: - Name
4 pages
rec-1975
No ratings yet
rec-1975
6 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
63 pages
ADBT Unit-1
No ratings yet
ADBT Unit-1
17 pages
Lecture No 04: By: Syed Aun Irtaza
No ratings yet
Lecture No 04: By: Syed Aun Irtaza
10 pages
05 QueryProcessing LecW4 Feb7 22
No ratings yet
05 QueryProcessing LecW4 Feb7 22
55 pages
rr220502 Data Base Management Systems
No ratings yet
rr220502 Data Base Management Systems
8 pages
Take Assessment: Exercise 6: Index Choice and Query Optimization
No ratings yet
Take Assessment: Exercise 6: Index Choice and Query Optimization
7 pages
QueryProcess Optim
No ratings yet
QueryProcess Optim
60 pages
CS 345: Topics in Data Warehousing: Thursday, October 21, 2004
No ratings yet
CS 345: Topics in Data Warehousing: Thursday, October 21, 2004
29 pages
Eee 207 Final Cat E&e, Bmed
No ratings yet
Eee 207 Final Cat E&e, Bmed
33 pages
Unit Iii & Iv
No ratings yet
Unit Iii & Iv
4 pages
Assignment-1 to 2 DBMS
No ratings yet
Assignment-1 to 2 DBMS
3 pages
Dbms Question Bank
100% (1)
Dbms Question Bank
28 pages
Lecture Notes
No ratings yet
Lecture Notes
96 pages
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
OS Chapter One and Two Sample Questions - 2
No ratings yet
OS Chapter One and Two Sample Questions - 2
31 pages
01chapter One-Introduction To OOP
No ratings yet
01chapter One-Introduction To OOP
17 pages
Quiz 2: 6.893 Database Systems: Fall 2004
No ratings yet
Quiz 2: 6.893 Database Systems: Fall 2004
12 pages
Advanced Database Management Systems: Query Processing: Chapter 1
No ratings yet
Advanced Database Management Systems: Query Processing: Chapter 1
23 pages
Lab One Manual
No ratings yet
Lab One Manual
2 pages
Oodbms Chapter 1
No ratings yet
Oodbms Chapter 1
2 pages
Wollega University: M.Sc. Proposal On
100% (1)
Wollega University: M.Sc. Proposal On
45 pages
Example: Array Scanning Searching
No ratings yet
Example: Array Scanning Searching
5 pages
Lab Two
No ratings yet
Lab Two
3 pages
1 W Kirubel Tesfaye PDF
No ratings yet
1 W Kirubel Tesfaye PDF
114 pages
Discrete Maths Worksheet
No ratings yet
Discrete Maths Worksheet
2 pages
Declaration: GSM Based Home Security System February 17, 2015
No ratings yet
Declaration: GSM Based Home Security System February 17, 2015
31 pages
Crime File Management System Project Report JAVA
100% (2)
Crime File Management System Project Report JAVA
153 pages
BW4HANA - DataFlow - LSA++ PDF
No ratings yet
BW4HANA - DataFlow - LSA++ PDF
1 page
Direct Discovery Technical Addendum 11.2 SR11 and Sense
No ratings yet
Direct Discovery Technical Addendum 11.2 SR11 and Sense
31 pages
253163201 Hotel Management
No ratings yet
253163201 Hotel Management
90 pages
Billing Management Console
No ratings yet
Billing Management Console
5 pages
SQL Part 4 Practice Questions and Answers
No ratings yet
SQL Part 4 Practice Questions and Answers
4 pages
Owsum: Algorithmic Odor Prediction and Insight Into Structure-Odor Relationships
No ratings yet
Owsum: Algorithmic Odor Prediction and Insight Into Structure-Odor Relationships
11 pages
Oracle Recovery Manager (RMAN)
No ratings yet
Oracle Recovery Manager (RMAN)
10 pages
Data Warehousing: Learning Objectives For Chapter 3
100% (1)
Data Warehousing: Learning Objectives For Chapter 3
24 pages
Midterm Test Kom4043-Pembangunan Sistem Pangkalan Data (2 HOURS)
No ratings yet
Midterm Test Kom4043-Pembangunan Sistem Pangkalan Data (2 HOURS)
7 pages
Dbdesign Airbnb
No ratings yet
Dbdesign Airbnb
17 pages
Database Systems A Practical Approach to Design Implementation and Management 4th Edition Thomas M. Connolly - The latest updated ebook is now available for download
No ratings yet
Database Systems A Practical Approach to Design Implementation and Management 4th Edition Thomas M. Connolly - The latest updated ebook is now available for download
32 pages
Data Processing SS1 3RD TERM
100% (2)
Data Processing SS1 3RD TERM
18 pages
Pavan Salesforce Developer
No ratings yet
Pavan Salesforce Developer
8 pages
Hibernate ORM 5.3.15.final User Guide
No ratings yet
Hibernate ORM 5.3.15.final User Guide
663 pages
LO2 Provide Advice On Software, Hardware or Network
100% (1)
LO2 Provide Advice On Software, Hardware or Network
11 pages
Introduction To R Programming PDF
No ratings yet
Introduction To R Programming PDF
5 pages
Transaction Processing and Enterprise Resource Planning Systems
No ratings yet
Transaction Processing and Enterprise Resource Planning Systems
12 pages
19bct0013 Dbms Lab Finall
No ratings yet
19bct0013 Dbms Lab Finall
8 pages
The Hardest Thing in Data Science - Caffeinated Data Science
No ratings yet
The Hardest Thing in Data Science - Caffeinated Data Science
5 pages
Exploring IBM Notes/Domino Activity Logging and Activity Trends
No ratings yet
Exploring IBM Notes/Domino Activity Logging and Activity Trends
35 pages
Presentation On Python Code To Modelling in Oil & Gas
No ratings yet
Presentation On Python Code To Modelling in Oil & Gas
18 pages
Data Science Course Syllabus
No ratings yet
Data Science Course Syllabus
37 pages
1) Why Database Design Is Important?
No ratings yet
1) Why Database Design Is Important?
3 pages
Coding Interview Prep Document Application Engineer, SAP: Time To Prepare!
No ratings yet
Coding Interview Prep Document Application Engineer, SAP: Time To Prepare!
7 pages
Homework 3 Solutions
No ratings yet
Homework 3 Solutions
4 pages
Third Year Engineering: 21BTCS604 - Advanced DBMS
No ratings yet
Third Year Engineering: 21BTCS604 - Advanced DBMS
24 pages
Nutri Care Report
No ratings yet
Nutri Care Report
52 pages
General Architecture of Text Mining Systems
No ratings yet
General Architecture of Text Mining Systems
6 pages
QL BH+GV
No ratings yet
QL BH+GV
27 pages
12-CS H.YPB-2
No ratings yet
12-CS H.YPB-2
8 pages