Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
57 views

Query Processing and Optimization: Dessalegn Mequanint

The document summarizes query processing and optimization. It discusses how a DBMS executes queries by scanning, parsing, validating, and evaluating queries to access and present data. It describes the main steps in query processing as scanning, parsing, validation, generating a query tree, and obtaining results by traversing the tree. The document also discusses relational operations, selection, projection, joins, and how query optimization improves performance by choosing better execution plans.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Query Processing and Optimization: Dessalegn Mequanint

The document summarizes query processing and optimization. It discusses how a DBMS executes queries by scanning, parsing, validating, and evaluating queries to access and present data. It describes the main steps in query processing as scanning, parsing, validation, generating a query tree, and obtaining results by traversing the tree. The document also discusses relational operations, selection, projection, joins, and how query optimization improves performance by choosing better execution plans.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Query Processing and

Optimization
Dessalegn Mequanint
Overview
• Querying, and algorithms to evaluate queries
• Declarative query versus Algebraic query
How does the DBMS execute queries?
• A DBMS scans an input query, parses, validates, and
evaluates it by accessing the actual data, finally
presenting the results.
• Example: Given the relations:
– EMPLOYEE(FNAME, MINIT, LNAME, SSN, BDATE,
ADDRESS, SEX, SALARY, SUPERSSN, DNO)
WORKS_ON(ESSN, PNO, HOURS)
– get the names of employees who work on project No. 3:
• SELECT EMPLOYEE.LNAME FROM EMPLOYEE, WORKS_ON
WHERE EMPLOYEE.SSN = WORKS_ON.ESSN AND
WORKS_ON.PNO = ’3’ ;
Query Processing
• Scanner: identifies the tokens (language components).
– In the above example SELECT, FROM, and so on are all
tokens.
• Parser: verifies the query syntax to make sure the
syntax rules are obeyed.
• Validation: checks that all attribute and relation
names are valid and semantically meaningful. SELECT
EMPLOYEE.ESSN FROM ...
– would be invalid since ESSN does not exist in the
EMPLOYEE relation.
Query Processing ...
• Generate a query tree: The internal representation of the
query is usually a tree or graph form, constructed from
bottom to top.
Project lNAME attribute

Filter for PNO = 3

Join on ESSN = SSN

Table WORKS_ON Table EMPLOYEE


• Results are obtained by going through the steps in the tree
Relational Operations
• There are methods for carrying out all relational
operations:
– Selection ( σ)—selects a subset of rows from relation
– Projection ( π)—deletes unwanted columns from
relation
– Set-difference ( −)—tuples in relation 1, but not in
relation 2
– Union ( ∪)—tuples in relation 1 together with tuples in
relation 2
– Aggregation (such as SUM, MIN, etc.) and group by
Selection ( σ)
• Selects rows that satisfy selection condition.
• No duplicates in result! (Why?)
• Schema of result identical to schema of (only)
input relation
• Result relation can be the input for another
relational algebra operation! (Operator
composition.)
Selection ( σ)…
• σAcc-no>300(BOOK) =
Acc-No
400
500

• σTitle=”DBMS”(
Title BOOK)=
DBMS
DBMS
Selection ( σ)…
• σ <Cond1> and <Cond2> and ….
• σ <Cond1> or <Cond2> and ….
• σ <Cond1> or <Cond2> or ….
Projection ( π)
• Deletes attributes that are not in projection list.
• Schema of result contains exactly the fields in
the projection list, with the same names that
they had in the (only) input relation. ( Unary
Operation)
• Projection operator has to eliminate duplicates!
– as it returns a relation which is a set
Projection ( π)…
• πTitle(BOOK)
• πIDNo, FName(Student)
• πEmpno, Fname, Sname, Salary(Employee)
Nesting Selection in Projection
• πAcc-no (σTitle=”DBMS” (BOOK))
• πIDNo, FName(σCGPA>=3.25 (Student))
• πEmpno, Fname, Sname, Salary(σSalary>=10000 (Employee))
Equality Joins With One Join Column
• Three forms of outer join:
– Left outer join(⋊) the tuples which doesn’t match while doing
natural join from left relation are also added in the result putting
null values in missing field of right relation.
– Right outer join(⋉) the tuples which doesn’t match while natural
join from right relation are also added in the result putting null
values in missing field of left relation.
• select * from employee e1, department d1 where e1.did =
d1.did
• In algebra: R ⋈ S. It is so commonly used that it must be
carefully optimised. R ×S is large; so, R ×S followed by a
selection is inefficient.
Query Optimization
• DBMS Architecture
Query Optimization…
• Optimiser Architecture
Benefits of Query Optimisation
• We know how to evaluate queries.
– So why is there a need to optimise?
• the query language is declarative, that is, the user specifies the
required result.
• the user does not specify the details of how to go about obtaining the
required result and, therefore, there is opportunity for query
optimisation.
• query optimisation is necessary for high level relational languages.
• the term optimal solution is often used for the best obtained solution,
– with certain given constraints the cost involved in obtaining the true optimal
solution may be too high and hence we often settle for non-optimal
solutions.
Advantages of Having a Query
Optimiser
• The optimiser can take advantage of information not
available to the programmer such as database
statistics
• Changes to the database, such as addition of an
index, do not require queries to be reprogrammed.
– The optimiser need only to calculate new execution plan
• The execution plan is the result of "intelligence"
built into the optimisers and not dependent on the
capability of the individual programmer
Example of Query Optimisation
• SELECT EMPLOYEE.LNAME
FROM EMPLOYEE, WORKS_ON
WHERE EMPLOYEE.SSN = WORKS_ON.ESSN
AND WORKS_ON.PNO = 3;
• Suppose there are:
100 employees
200 WORKS_ON entries
10 of which are PNO = 3
Example…
• Solution 1: Take the cartesian product (×)
of EMPLOYEE and WORKS_ON.
– This will involve reading 100 + 200 tuples and writing 20,000
tuples.
– Restrict this result by the where clause: read in the 20,000
tuples to give the final result of 10 tuples.
• Solution 2: Apply WORKS_ON.PNO = 3 condition first.
– This involves reading 200 tuples and writing 10 tuples
(where PNO = 3). Perform join operation on the above result
with EMPLOYEE: read 100 tuples giving result of 10 tuples.
Overview of Query Optimisation
• Plan: Tree of relational algebra operations, with choice of
algorithm for each operation. Each operator is typically
implemented using a "pull" interface: when an operator is
"pulled" for the next output tuples, it "pulls" on its inputs and
computes them.
• Two main issues:
– For a given query, what plans are considered? (We need algorithms
to search the plan space for cheapest [estimated] plan.)
– How is the cost of a plan estimated?
• Ideally: Want to find best plan. Practically: Avoid worst plans!
Example
• Schemas:
– Sailor(sid: int, sname: string, rating: int, age: real)
– Reserves(sid: int, bid: int, day: dates, rname: string)
– Reserves: Each tuple is 40 bytes long, 100 tuples per
page, 1000 pages.
– Sailors: Each tuple is 50 bytes long, 80 tuples per page,
500 pages.
• Query:
– select S.sname from reserves R, sailor S where S.sid =
R.sid AND R.bid=100 AND S.rating > 5
Example…
• Relational Algebra tree:

M + pr*M*N
Example…
• Plan:
Cost: 500 + 500 × 1000 I/Os
•By no means the worst plan!
•Misses several opportunities:
selections could have been
"pushed" earlier, no use is made
of any available indexes, and so
on.
•Goal of optimisation: To find
more efficient plans that compute
the same answer.
Example…
• Alternative Plans 1 (No Indexes)
Main difference: 
pushes selects.
Total cost is 3,560
page I/Os
Exercise
• Alternative Plans 2 (With Indexes)

total :1,210 I/Os


Reading Assignment
• What is System R or System R approach?
Using Heuristic in Query Optimization
• The following is a brief outline of the transformation steps which
will lead to an optimised tree that is more efficient to execute.
• The main idea is to apply first the operations that reduce the size of
intermediate results.
– Break up SELECT operation with conjunctive condition into a cascade
of SELECT operations.
– Move SELECT operations as far down the tree as possible.
– Rearrange leaf nodes of the tree so that relations with the most
restrictive SELECT operations are executed first.
– Combine (Cartesian PRODUCT followed by SELECT) into a JOIN operation
where possible.
– Move PROJECT as far down the tree as possible (breaking up the condition
first if necessary).
Semantic Query Optimization
• Semantic – of or relating to meaning or the study of meaning.
• Semantic information stored in databases as integrity
constraints could be used for query optimization.
• integrity : preserve data consistency when changes made in a
database.
• A different approach to query optimization, called semantic
query optimization, has been suggested. This technique,
which may be used in combination with the techniques
discussed previously, uses constraints specified on the
database schema.
– such as unique attributes and other more complex constraints.
Semantic Query Optimization…
• SELECT E.Lname, M.Lname FROM EMPLOYEE AS
E, EMPLOYEE AS M WHERE E.Super_ssn=M.Ssn AND
E.Salary > M.Salary
• This query retrieves the names of employees who earn
more than their supervisors.
• Suppose that we had a constraint on the database schema
that stated that no employee can earn more than his or
her direct supervisor. 
– If the semantic query optimizer checks for the existence of this
constraint, it does not need to execute the query at all because
it knows that the result of the query will be empty.
Semantic Query Optimization…
• Query execution can be improved by:
– Analyzing integrity information, and rewriting
queries exploiting this information
– Avoid expensive sorting costs (Order
Optimization)
– Exploiting uniqueness by knowing rows will be
unique, thus, avoiding extra sorts
Semantic Query Optimization techniques
• Join Elimination (JE)

• Predicate Introduction (PI)

• Order Optimization (OO)

• Exploiting Uniqueness (EU)

You might also like