0% found this document useful (0 votes)

254 views

CSE 444 Practice Problems

This document discusses query optimization for an SQL query involving three tables: Applicants, Schools, and Major. It provides the cardinality and storage details for each table, the full query, and then poses several questions about optimizing the query: (a) Calculates the I/O cost of a specific query plan as 119 pages. (b) Explains the Selinger query optimizer considers left-deep plans, works bottom-up, and tracks "interesting orders" to optimize queries. (c) Asks about optimizing the given query considering an index on the Major table.

Uploaded by

Lavanay Thakral

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

254 views

CSE 444 Practice Problems

Uploaded by

Lavanay Thakral

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

CSE 444 Practice Problems

Query Optimization

1. Query Optimization
Given the following SQL query:
Student (sid, name, age, address)
Book(bid, title, author)
Checkout(sid, bid, date)

SELECT S.name
FROM Student S, Book B, Checkout C
WHERE S.sid = C.sid
AND B.bid = C.bid
AND B.author = ’Olden Fames’
AND S.age > 12
AND S.age < 20
And assuming:
• There are 10, 000 Student records stored on 1, 000 pages.
• There are 50, 000 Book records stored on 5, 000 pages.
• There are 300, 000 Checkout records stored on 15, 000 pages.
• There are 500 different authors.
• Student ages range from 7 to 24.

1
(a) Show a physical query plan for this query, assuming there are no indexes and data is not sorted
on any attribute.
Solution:
Note: many solutions are possible.

On the fly Πname

On the fly σ12<age<20∧author=0 OldEnF ames0

Tuple-base nested loop 1bid

Block nested loop 1sid Scan: Book

Scan: Student Scan: Checkout

Figure 1: One possible query plan (all joins are nested-loop joins)

(b) Compute the cost of this query plan and the cardinality of the result.
Solution:
Cost Cardinality Remarks
S1C B(S) + B(S) * B(C) 300000 (foreign-key (1)
= 1000 + 1000 * 15000 join)
= 15001000
(S 1 C) 1 B T(S 1 C) * B(S) 300000 (foreign-key (2)
= T(C) * B(S) join)
= 300000 * 5000
= 1500000000
σ and Π On the fly 300000 * σauthor * σage (3)
1 7
= 300000 * 500 ∗ 18
≈ 234
Total 1515001000 234
(1) We are doing page at a time nested loop join. Also, the output is pipelined to next join.
(2) The output relation is pipelined from below. Thus, we don’t need the scanning term for outer
relation.
(3) We assume uniform value distributions for age and author. We assume independence among
participating columns.

2
(c) Suggest two indexes and an alternate query plan for this query.
Solution:
Note: many solutions are possible. For purposes of illustation, we assume an unclustered B+-tree
index on Book.author and a clustered B+tree index on Checkout.bid:

Πname On the fly

σ12<age<20 On the fly

1sid Block nested loop

On the fly Πsid Scan: Student

Indexed nested loop 1bid

Πbid

Index Scan: Book

σauthor=0 OldenF ames0 Index Scan: Checkout

Figure 2: One possible query plan that uses the two indexes

(d) Compute the cost of your new plan.

Solution:
N (B) = # of tuples per page for Book = T(B)/B(B) = 10
N (C) = # of tuples per page for Checkout = T(C)/B(C) = 20
Cost Cardinality Remarks
Index Scan T(B) * 1/V(B) 100 (1)
1
on Book = 50000 ∗ 500
with σauthor = 100
Πsid (B 1 C) 100 ∗ d(T (C)/V (bid))/N (C)e 100*T(C)/ Max( 100, V(C,bid) ) (2)
= 100 ∗ d(300000/50000)/20e = 600
= 100
1sid B(S) = 1000 ≈ 234 (3)
Total 1200
(1) We assume all intermediate index pages are in memory. Note: because bid is the search-key
for the index, the bid values are in the leaf pages of the index: they are thus in memory as per the
first assumption. Because we project on bid right after the selection, we only need these values,
so we do not really need to perform any disk I/Os.
(2) One index lookup per outer tuple. Assuming uniform distribution, there will be 6 checkouts
per book. Assuming all intermediate index pages are in memory, the 6 records can be fetched
with only one or two disk accesses since we have a clustered index on Checkount.bid. The above
computation is optimistic but it only incurs 100 more I/Os in the worst case.
(3) Again, the output of the previous operation is projected on sid. Because there are only 600
tuples, it is reasonable to assume all results can hold in memory. Since the outer relation is
already in-memory, we only need to scan the inner relation Student one time.

3
(e) Explain the steps that the Selinger query optimizer would take to optimize this query.
Solution:
A query optimizer explores the space of possible query plans to find the most promising one. The
Selinger query optimizer performs the search as follows:
• Only considering left-deep query plans.
Instead of enumerating all possible plans and evaluating their costs, the optimizer keeps the
efficient pipelined execution model in mind. Thus, it only looks for left-deep query plans and
enumerates different join orders. It considers cartesian products as late as possible to reduce
I/O costs. It considers only nested-loop and sort-merge joins.
• In bottom-up fashion.
The optimizer starts by finding the best plan for one relation. It then expands the plan
by adding one relation at a time as an inner relation. For each level, it keeps track of the
cheapest plan per interesting output order, which will be explained shortly, as well as the
cheapest plan overall. When computing the cost of a plan, the Selinger considers both I/O
cost and CPU cost.
• Considering interesting orders.
If the query has an ORDER BY or a GROUP BY clause, having results ordered by the
columns that appear in those clauses can reduce the cost of the query plan because it can
save extra I/Os needed by sort or aggregation. Similarly, attributes that appear in join
conditions are considered interesting orders because they reduce the cost of sort-merge joins.
When the Selinger optimizer evaluates a plan, at each stage, it keeps track of the cheapest
plan per interesting order in addition to the cheapest plan overall.

4
2. Query Optimization
Consider the following SQL query that finds all applicants who want to major in CSE, live in Seattle,
and go to a school ranked better than 10 (i.e., rank < 10).
Relation Cardinality Number of pages Primary key
Applicants (id, name, city, sid) 2,000 100 id
Schools (sid, sname, srank) 100 10 sid
Major (id, major) 3,000 200 (id,major)

SELECT A.name
FROM Applicants A, Schools S, Major M
WHERE A.sid = S.sid AND A.id = M.id
AND A.city = ’Seattle’ AND S.rank < 10 AND M.major = ’CSE’

And assuming:
• Each school has a unique rank number (srank value) between 1 and 100.
• There are 20 different cities.
• Applicants.sid is a foreign key that references Schools.sid.
• Major.id is a foreign key that references Applicants.id.
• There is an unclustered, secondary B+ tree index on Major.id and all index pages are in memory.

(a) What is the cost of the query plan below? Count only the number of page I/Os.

(One-the-fly) (6) π name

(One-the-fly) (5) σ major = ‘CSE’

(Index nested loop) (4)

id = id

(Sort-merge) (3)
sid = sid

Major
(1) σ city=‘Seattle’ (2) σ srank < 10 (B+ tree index on id)

Applicants Schools
(File scan) (File Scan)

5
Solution:
The total cost of this query plan is 119 I/Os computed as follows:
• (1) The cost of scanning Applicants is 100 I/Os. The output of the selection operator is
100 2000
20 = 5 pages or 20 = 100 tuples.
• (2) The cost of scanning Schools is 10 I/Os. The selectivity of the predicate on rank is
10−1
100 = 0.09. The output is thus 0.09 ∗ 10 ≈ 1 page or 0.09 ∗ 100 ≈ 9 tuples.
• (3) Given that the input to this operator is only six pages, we can do an in-memory sort-merge
join. The cardinality of the output will be 9 tuples. There are two ways to compute this: (a)
100∗9
(max(100,9)) = 9 (see book Section 15.2.1 on page 484) or (b) consider that this is a key-foreign
key join and each applicant can match with at most one school but keep in mind that the
predicates on city and rank were independent, hence only 0.9 of the applicants end-up with
a matching school.
• (4) The index-nested loop join must perform one look-up for each input tuple in the outer
relation. We assume that each student only declares a handful of majors, so all the matches
fit in one page. The cost of this operator is thus 9 I/Os.
• (5) and (6) are done on-the-fly, so there are no I/Os associated with these operators.

(b) The Selinger optimizer uses a dynamic programming algorithm coupled with a set of heuristics
to enumerate query plans and limit its search space. Draw two query plans for the above query
that the Selinger optimizer would NOT consider. For each query plan, indicate why it would not
be considered.
Solution:
Many solutions were possible including:

A plan that joins Schools with Major first Right-deep plans are
would not be considered because π name not considered either. π name
it would require a cartesian product
that can be avoided
σ major = ‘CSE’ and city=‘Seattle’ σ major = ‘CSE’

(Nested loop) (Nested loop)

id = id id = id

Major (Sort-merge join

(Nested loop)
Applicants (File scan) and write to file) sid = sid
(File scan)

σ srank < 10 σ city=‘Seattle’ σ srank < 10

Major
Schools (File scan) Applicants Schools
(File Scan) (File scan) (File Scan)

6
3. Query Optimization
Consider the schema R(a,b), S(b,c), T(b,d), U(b,e).

(a) For the following SQL query, give two equivalent logical plans in relational algebra such that one
is likely to be more efficient than the other. Indicate which one is likely to be more efficient.
Explain.
SELECT R.a
FROM R, S
WHERE R.b = S.b
AND S.c = 3
Solution:

i. πa (σc=3 (R ./b=b (S)))

ii. πa (R ./b=b σc=3 (S)))
ii. is likely to be more efficient
With the select operator applied first, fewer tuples need to be joined.

(b) Recall that a left-deep plan is typically favored by optimizers. Write a left-deep plan for the
following SQL query. You may either draw the plan as a tree or give the relational algebra
expression. If you use relational algebra, be sure to use parentheses to indicate the order that the
joins should be performed.
SELECT *
FROM R, S, T, U
WHERE R.b = S.b
AND S.b = T.b
AND T.b = U.b
Solution:
((R ./b=b S) ./b=b T ) ./b=b U

7
(c) Physical plans. Assume that all tables are clustered on the attribute b, and there are no secondary
indexes. All tables are large. Do not assume that any of the relations fit in memory.
For the left-deep plan you gave in (b), suggest an efficient physical plan.
Specify the physical join operators used (hash, nested loop, sortmerge, etc.) and the access
methods used to read the tables (sequential scan, index, etc.). Explain why your plan is efficient.
For operations where it matters, be sure to include the details — for instance, for a hash join,
which relation would be stored in the hash tables; for a loop join, which relation would be the
inner or outer loop. You should specify how the topmost join reads the result of the lower one.
Solution:
join order doesn’t matter, sortmerge for every join, seqscan for R,S,T,U. Fully pipelined.
“clustered index scan” instead of seqscan is also correct.

(d) For the physical plan you wrote for (c), give the estimated cost in terms of B(...), V (...), and
T (...). Explain each term in your expression.
Solution:
B(R) + B(S) + B(T ) + B(U ). Just need to read each table once.

Sop DBA Ajay
100% (1)
Sop DBA Ajay
76 pages
Download Guide to Wireless Communications Olenewa ebook All Chapters PDF
100% (2)
Download Guide to Wireless Communications Olenewa ebook All Chapters PDF
62 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
ICT912 Programming
No ratings yet
ICT912 Programming
8 pages
Distributed Programming Study Guide
100% (2)
Distributed Programming Study Guide
33 pages
Mcs 023
No ratings yet
Mcs 023
261 pages
CSEN604: Database II Project 1: German University in Cairo Faculty of Media Engineering and Technology
No ratings yet
CSEN604: Database II Project 1: German University in Cairo Faculty of Media Engineering and Technology
11 pages
DBMS Lab Da-2
No ratings yet
DBMS Lab Da-2
4 pages
VL2023240503482_AST02
100% (1)
VL2023240503482_AST02
7 pages
Database Management Systems Lab Assessment-3: Name: Atharv Kugaji REG NO:19BIT0347
100% (1)
Database Management Systems Lab Assessment-3: Name: Atharv Kugaji REG NO:19BIT0347
15 pages
Assignment - I
No ratings yet
Assignment - I
5 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
69 pages
Online Feedback Management System
No ratings yet
Online Feedback Management System
27 pages
Maid Hiring Management System
No ratings yet
Maid Hiring Management System
32 pages
Data Valley 21VV1A0510
No ratings yet
Data Valley 21VV1A0510
85 pages
Data Structure BSIT 3rd Semester
No ratings yet
Data Structure BSIT 3rd Semester
30 pages
Project Report
No ratings yet
Project Report
133 pages
Cs-02 Networking & Internet Environment All
No ratings yet
Cs-02 Networking & Internet Environment All
155 pages
Dbms Lab Record 2 Sem All Solved Full
No ratings yet
Dbms Lab Record 2 Sem All Solved Full
9 pages
Chapter 31 Computer-Aided Software Engineering
100% (1)
Chapter 31 Computer-Aided Software Engineering
6 pages
Your Project Title: Bangladesh University of Business and Technology (Bubt)
No ratings yet
Your Project Title: Bangladesh University of Business and Technology (Bubt)
2 pages
DFD For Online Admission
No ratings yet
DFD For Online Admission
30 pages
Online Library Management System
100% (1)
Online Library Management System
25 pages
Student Information System Project in PHP
No ratings yet
Student Information System Project in PHP
3 pages
COIT20247 - Database Design and Development
0% (1)
COIT20247 - Database Design and Development
9 pages
Chapter 5: System Implementation and Maintenance
No ratings yet
Chapter 5: System Implementation and Maintenance
12 pages
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
No ratings yet
X (Age Youth, Income Medium, Student Yes, Credit Rating Fair)
2 pages
BCS403 DBMS Lab Manual
No ratings yet
BCS403 DBMS Lab Manual
63 pages
Chapter 02 Past Paper and Excercise Questions
No ratings yet
Chapter 02 Past Paper and Excercise Questions
10 pages
Major Synopsis IPU PDF
No ratings yet
Major Synopsis IPU PDF
17 pages
CS341Tut3 PDF
100% (1)
CS341Tut3 PDF
3 pages
Data Base Normalization and ERD
No ratings yet
Data Base Normalization and ERD
23 pages
Report Writing Presentation Evaluation Guidelines For BSC CSIT Project
No ratings yet
Report Writing Presentation Evaluation Guidelines For BSC CSIT Project
39 pages
Hard Copy of Faculty Feedback System
83% (6)
Hard Copy of Faculty Feedback System
16 pages
DDM Lab Main - 1
No ratings yet
DDM Lab Main - 1
74 pages
Assignment 2 Web Design
No ratings yet
Assignment 2 Web Design
63 pages
Software Requirements Specifications)
No ratings yet
Software Requirements Specifications)
12 pages
Unit 9 - Retrieve Data Using Subqueries
No ratings yet
Unit 9 - Retrieve Data Using Subqueries
20 pages
CSE 421 ID: 18101085 Application Layer Protocols (HTTP - Smtp/Pop) Examination Lab
No ratings yet
CSE 421 ID: 18101085 Application Layer Protocols (HTTP - Smtp/Pop) Examination Lab
5 pages
137673885assignments For Class Xii Ip PDF
No ratings yet
137673885assignments For Class Xii Ip PDF
52 pages
Web Technology - ISolved Practical Slips
100% (1)
Web Technology - ISolved Practical Slips
30 pages
OS Lab Manual 4
No ratings yet
OS Lab Manual 4
17 pages
Word Excercise
No ratings yet
Word Excercise
41 pages
STUdENT Refgistation
No ratings yet
STUdENT Refgistation
19 pages
Hapter: Simple Sorting and Searching Algorithms
No ratings yet
Hapter: Simple Sorting and Searching Algorithms
27 pages
Software Requirements Specification Vers Alumni
No ratings yet
Software Requirements Specification Vers Alumni
27 pages
Med - Leaf - Full Report
No ratings yet
Med - Leaf - Full Report
31 pages
Bachelor of Technology in Computer Science and Engineering: Mini Project Report
No ratings yet
Bachelor of Technology in Computer Science and Engineering: Mini Project Report
29 pages
CS6004NP - Application Development - Ashish - Bhandari - 17031918 - CW1
No ratings yet
CS6004NP - Application Development - Ashish - Bhandari - 17031918 - CW1
68 pages
Nested Queries
100% (1)
Nested Queries
3 pages
Employee Management System
No ratings yet
Employee Management System
121 pages
Internship Project PPT Template 2
No ratings yet
Internship Project PPT Template 2
12 pages
Unit - Iv: Machine Learning (ML) For Iot
No ratings yet
Unit - Iv: Machine Learning (ML) For Iot
17 pages
Unit 5 Parallel and Distributed Databases
No ratings yet
Unit 5 Parallel and Distributed Databases
22 pages
Joins in Dbms
No ratings yet
Joins in Dbms
19 pages
Unit 1 PDF
No ratings yet
Unit 1 PDF
100 pages
Project Doc1
No ratings yet
Project Doc1
26 pages
DSA Notes Part 2 1696643355
No ratings yet
DSA Notes Part 2 1696643355
32 pages
BIS Revision Lecture Notes
No ratings yet
BIS Revision Lecture Notes
6 pages
SQL Practical
No ratings yet
SQL Practical
16 pages
50 Python Concepts Every Developer Should Know
From Everand
50 Python Concepts Every Developer Should Know
Hernando Abella
No ratings yet
Mongodb Lab Viva Questions
No ratings yet
Mongodb Lab Viva Questions
8 pages
DBMS Long Questions
No ratings yet
DBMS Long Questions
23 pages
Master 2
No ratings yet
Master 2
173 pages
Accountancy Practical
No ratings yet
Accountancy Practical
4 pages
SQL Tuning Basic Tips and Trciks
100% (2)
SQL Tuning Basic Tips and Trciks
23 pages
CS holiday homework (3)
No ratings yet
CS holiday homework (3)
71 pages
New Technical Interview Questions in Oracle
No ratings yet
New Technical Interview Questions in Oracle
9 pages
ABAP 7.40 Quick Reference
100% (1)
ABAP 7.40 Quick Reference
15 pages
Core Abap Notes
No ratings yet
Core Abap Notes
38 pages
Banking Management System
67% (3)
Banking Management System
94 pages
Sap-Abap - Oops Alv Interactive Report
No ratings yet
Sap-Abap - Oops Alv Interactive Report
6 pages
Oracle Tutorial
No ratings yet
Oracle Tutorial
53 pages
DBMS Technical Publications Chapter 1
100% (2)
DBMS Technical Publications Chapter 1
24 pages
White Paper - Adabas To Oracle
No ratings yet
White Paper - Adabas To Oracle
11 pages
Oracle Apps R122 - ADO
No ratings yet
Oracle Apps R122 - ADO
147 pages
Replicación de Bases de Datos PostgreSQL Con Bucardo
No ratings yet
Replicación de Bases de Datos PostgreSQL Con Bucardo
35 pages
Error Messages
No ratings yet
Error Messages
258 pages
Comandos Informix
No ratings yet
Comandos Informix
10 pages
MCA S3 SQL for Data Science U1
No ratings yet
MCA S3 SQL for Data Science U1
18 pages
Modern Database Management 10th Edition Hoffer Solutions Manual - Quick Download In Full PDF Format With All Chapters
100% (4)
Modern Database Management 10th Edition Hoffer Solutions Manual - Quick Download In Full PDF Format With All Chapters
47 pages
Session 1686 Christman 2019 - 1568584704127001SUB7
No ratings yet
Session 1686 Christman 2019 - 1568584704127001SUB7
43 pages
Core2 Cagt CP
No ratings yet
Core2 Cagt CP
43 pages
Natural System Variables
0% (1)
Natural System Variables
68 pages
Unit 1 - Basics of RDBMS
No ratings yet
Unit 1 - Basics of RDBMS
11 pages
Microsoft T-SQL Performance Tuning Part 2: Index Tuning Strategies
No ratings yet
Microsoft T-SQL Performance Tuning Part 2: Index Tuning Strategies
16 pages
Serena Hotel Project
100% (1)
Serena Hotel Project
30 pages
Rebuilding Reliable Data Pipelines Through Modern Tools PDF
100% (1)
Rebuilding Reliable Data Pipelines Through Modern Tools PDF
99 pages
DataGrid Zend Framework - Manual
No ratings yet
DataGrid Zend Framework - Manual
14 pages
Pandas Advance Quiz - Data Science Masters - PW Skills
100% (1)
Pandas Advance Quiz - Data Science Masters - PW Skills
5 pages

CSE 444 Practice Problems

Uploaded by

CSE 444 Practice Problems

Uploaded by

CSE 444 Practice Problems

On the fly Πname

On the fly σ12<age<20∧author=0 OldEnF ames0

Tuple-base nested loop 1bid

Block nested loop 1sid Scan: Book

Scan: Student Scan: Checkout

Πname On the fly

σ12<age<20 On the fly

1sid Block nested loop

On the fly Πsid Scan: Student

Indexed nested loop 1bid

Index Scan: Book

(d) Compute the cost of your new plan.

(One-the-fly) (6) π name

(One-the-fly) (5) σ major = ‘CSE’

(Index nested loop) (4)

(Nested loop) (Nested loop)

Major (Sort-merge join

σ srank < 10 σ city=‘Seattle’ σ srank < 10

i. πa (σc=3 (R ./b=b (S)))

You might also like