Query Processing and Optimization: Dessalegn Mequanint
Query Processing and Optimization: Dessalegn Mequanint
Optimization
Dessalegn Mequanint
Overview
• Querying, and algorithms to evaluate queries
• Declarative query versus Algebraic query
How does the DBMS execute queries?
• A DBMS scans an input query, parses, validates, and
evaluates it by accessing the actual data, finally
presenting the results.
• Example: Given the relations:
– EMPLOYEE(FNAME, MINIT, LNAME, SSN, BDATE,
ADDRESS, SEX, SALARY, SUPERSSN, DNO)
WORKS_ON(ESSN, PNO, HOURS)
– get the names of employees who work on project No. 3:
• SELECT EMPLOYEE.LNAME FROM EMPLOYEE, WORKS_ON
WHERE EMPLOYEE.SSN = WORKS_ON.ESSN AND
WORKS_ON.PNO = ’3’ ;
Query Processing
• Scanner: identifies the tokens (language components).
– In the above example SELECT, FROM, and so on are all
tokens.
• Parser: verifies the query syntax to make sure the
syntax rules are obeyed.
• Validation: checks that all attribute and relation
names are valid and semantically meaningful. SELECT
EMPLOYEE.ESSN FROM ...
– would be invalid since ESSN does not exist in the
EMPLOYEE relation.
Query Processing ...
• Generate a query tree: The internal representation of the
query is usually a tree or graph form, constructed from
bottom to top.
Project lNAME attribute
• σTitle=”DBMS”(
Title BOOK)=
DBMS
DBMS
Selection ( σ)…
• σ <Cond1> and <Cond2> and ….
• σ <Cond1> or <Cond2> and ….
• σ <Cond1> or <Cond2> or ….
Projection ( π)
• Deletes attributes that are not in projection list.
• Schema of result contains exactly the fields in
the projection list, with the same names that
they had in the (only) input relation. ( Unary
Operation)
• Projection operator has to eliminate duplicates!
– as it returns a relation which is a set
Projection ( π)…
• πTitle(BOOK)
• πIDNo, FName(Student)
• πEmpno, Fname, Sname, Salary(Employee)
Nesting Selection in Projection
• πAcc-no (σTitle=”DBMS” (BOOK))
• πIDNo, FName(σCGPA>=3.25 (Student))
• πEmpno, Fname, Sname, Salary(σSalary>=10000 (Employee))
Equality Joins With One Join Column
• Three forms of outer join:
– Left outer join(⋊) the tuples which doesn’t match while doing
natural join from left relation are also added in the result putting
null values in missing field of right relation.
– Right outer join(⋉) the tuples which doesn’t match while natural
join from right relation are also added in the result putting null
values in missing field of left relation.
• select * from employee e1, department d1 where e1.did =
d1.did
• In algebra: R ⋈ S. It is so commonly used that it must be
carefully optimised. R ×S is large; so, R ×S followed by a
selection is inefficient.
Query Optimization
• DBMS Architecture
Query Optimization…
• Optimiser Architecture
Benefits of Query Optimisation
• We know how to evaluate queries.
– So why is there a need to optimise?
• the query language is declarative, that is, the user specifies the
required result.
• the user does not specify the details of how to go about obtaining the
required result and, therefore, there is opportunity for query
optimisation.
• query optimisation is necessary for high level relational languages.
• the term optimal solution is often used for the best obtained solution,
– with certain given constraints the cost involved in obtaining the true optimal
solution may be too high and hence we often settle for non-optimal
solutions.
Advantages of Having a Query
Optimiser
• The optimiser can take advantage of information not
available to the programmer such as database
statistics
• Changes to the database, such as addition of an
index, do not require queries to be reprogrammed.
– The optimiser need only to calculate new execution plan
• The execution plan is the result of "intelligence"
built into the optimisers and not dependent on the
capability of the individual programmer
Example of Query Optimisation
• SELECT EMPLOYEE.LNAME
FROM EMPLOYEE, WORKS_ON
WHERE EMPLOYEE.SSN = WORKS_ON.ESSN
AND WORKS_ON.PNO = 3;
• Suppose there are:
100 employees
200 WORKS_ON entries
10 of which are PNO = 3
Example…
• Solution 1: Take the cartesian product (×)
of EMPLOYEE and WORKS_ON.
– This will involve reading 100 + 200 tuples and writing 20,000
tuples.
– Restrict this result by the where clause: read in the 20,000
tuples to give the final result of 10 tuples.
• Solution 2: Apply WORKS_ON.PNO = 3 condition first.
– This involves reading 200 tuples and writing 10 tuples
(where PNO = 3). Perform join operation on the above result
with EMPLOYEE: read 100 tuples giving result of 10 tuples.
Overview of Query Optimisation
• Plan: Tree of relational algebra operations, with choice of
algorithm for each operation. Each operator is typically
implemented using a "pull" interface: when an operator is
"pulled" for the next output tuples, it "pulls" on its inputs and
computes them.
• Two main issues:
– For a given query, what plans are considered? (We need algorithms
to search the plan space for cheapest [estimated] plan.)
– How is the cost of a plan estimated?
• Ideally: Want to find best plan. Practically: Avoid worst plans!
Example
• Schemas:
– Sailor(sid: int, sname: string, rating: int, age: real)
– Reserves(sid: int, bid: int, day: dates, rname: string)
– Reserves: Each tuple is 40 bytes long, 100 tuples per
page, 1000 pages.
– Sailors: Each tuple is 50 bytes long, 80 tuples per page,
500 pages.
• Query:
– select S.sname from reserves R, sailor S where S.sid =
R.sid AND R.bid=100 AND S.rating > 5
Example…
• Relational Algebra tree:
M + pr*M*N
Example…
• Plan:
Cost: 500 + 500 × 1000 I/Os
•By no means the worst plan!
•Misses several opportunities:
selections could have been
"pushed" earlier, no use is made
of any available indexes, and so
on.
•Goal of optimisation: To find
more efficient plans that compute
the same answer.
Example…
• Alternative Plans 1 (No Indexes)
Main difference:
pushes selects.
Total cost is 3,560
page I/Os
Exercise
• Alternative Plans 2 (With Indexes)