Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
5 views

Chapter 2 Query processing and optimization [Autosaved]

software

Uploaded by

amentiabraham674
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Chapter 2 Query processing and optimization [Autosaved]

software

Uploaded by

amentiabraham674
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 35

Advanced Database

systems

Chapter 2: Query
Processing and
Optimization

1
Overview of Query
Processing
What is query processing?
The activities involved in parsing, validating, optimizing, and executing a
query.
The aims of query processing are to transform a query written in a high-

level language into correct and efficient execution strategy expressed in a


low-level language. (i.e. SQL implementing the relational algebra).
What is query optimization?
The activity of choosing an efficient execution strategy for processing
optimization a query.
An important aspect of query processing.

The aim of query optimization is to choose the one that minimizes

resource usage.
Generally, we try to reduce the total execution time of the query, which

is the sum of the execution times of all individual operations that make up
the query.
2
Query optimization:
Example
Comparison of different processing
strategies
Find all Managers who work at a London branch.

We can write this query in SQL as:


SELECT *
FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = ‘Manager’ AND b.city = ‘London’);

3
Query optimization:
Example cont’d…
Three equivalent relational algebra queries corresponding to this SQL
statement are:
σ
1. (position=‘Manager’) ∧ (city=‘London’) ∧ (Staff.branchNo=Branch.branchNo) (Staff × Branch)
σ(position=‘Manager’) ∧ (city=‘London’)(Staff
2.
Staff.branchNo=Branch.branchNo
Branch)
(σposition=‘Manager’(Staff))
3. Staff.branchNo=Branch.branchNo

(σcity=‘London’(Branch))
For this particular example assume there are 1000 tuples in

Staff, 50 tuples in Branch, 50 Managers (one for each branch),


and 5 London branches.
We compare these three queries based on the number of disk

accesses required.
There are no indexes or sort keys on either relation.

4
Query optimization:
Example cont’d…
The first query calculates the Cartesian product of
Staff and Branch
σ(position=‘Manager’) ∧ (city=‘London’) ∧ (Staff.branchNo=Branch.branchNo)
(Staff × Branch)

(1000 + 50) disk accesses to read the relations


creates a relation with (1000 * 50) tuples

to read each of these tuples again to test them against

the selection predicate (1000 * 50) disk accesses


giving a total cost of:

(1000 + 50) + 2*(1000 * 50) = 101 050 disk


accesses

5
Example…
The second query joins Staff and Branch on the branch
number branchNo
σ(position=‘Manager’) ∧ (city=‘London’)(Staff Staff.branchNo=Branch.branchNo
Branch)
Requires (1000 + 50) disk accesses to read each of the
relations.
The join of the two relations has 1000 tuples, one for each

member of staff (a member of staff can only work at one


branch).
The Selection operation requires 1000 disk accesses to read the

result of the join.


giving a total cost of:

2*1000 + (1000 + 50) = 3050 disk accesses


6
Example…
The final query first reads each Staff tuple to determine
the Manager tuples (σposition=‘Manager’(Staff))
Staff.branchNo=Branch.branchNo (σcity=‘London’(Branch))
Requires 1000 disk accesses and produces a relation with 50

tuples.
The second Selection operation reads each Branch tuple to

determine the London branches.


Which requires 50 disk accesses and produces a relation with 5

tuples.
The final operation is the join of the reduced Staff and Branch

relations, which requires (50 + 5) disk accesses.


giving a total cost of:

1000 + 2*50 + 5 + (50 + 5) = 1160 disk accesses


Clearly the third option is the best in this case, by a factor of

87:1.
If we increase the no. of data to 10 times, the factor is 870:1. 7
Phases of query processing.

8
Dynamic versus static
optimization
 Dynamic: carry out decomposition and
optimization every time the query is run.
 Static: where the query is parsed, validated,
and optimized once.

9
Query Decomposition
 Query decomposition is the first phase of query
processing.
 The aims of query decomposition are to transform
a high-level query into a relational algebra query.
Stages of query decomposition
1. Analysis
2. Normalization
3. Semantic analysis
4. Simplification
5. Query restructuring

10
Query Decomposition:
stages cont’d…
1.Analysis
 The query is lexically and syntactically analyzed using
the techniques of programming language compilers.
 Verifies that the relations and attributes specified in
the query are defined in the system catalog.
 Example: Assume we have a Staff table with staffno.
and with position attribute which accepts variable
character string. In the following query staffNumber is
not defined and position is incompatible datatype.
SELECT staffNumber
FROM Staff
WHERE position > 10;
11
Query Decomposition:
stages cont’d..
2. Normalization
 Converts the query into a normalized form that can be more
easily manipulated.
 i.e. in SQL, the WHERE condition converted into one of two
forms by applying a few transformation rule.
 Conjunctive normal form: A sequence of conjuncts that
are connected with the ∧ (AND) operator.
e.g. (position = ‘Manager’ ∨ salary > 20000) ∧ branchNo =
‘B003’
 Disjunctive normal form :A sequence of disjuncts that
are connected with the ∨ (OR) operator.
e.g. (position = ‘Manager’ ∧ branchNo = ‘B003’ ) ∨ (salary >
20000 ∧ branchNo = ‘B003’)
12
Query Decomposition:
stages cont’d..
3. Semantic analysis
 objective of semantic analysis is to reject
normalized queries that are incorrectly
formulated or contradictory.
 A query is incorrectly formulated if components
do not contribute to the generation of the result.
which may happen if some join specifications are
missing.
 For example, the predicate (position = ‘Manager’
∧ position = ‘Assistant’) on the Staff relation is
contradictory, as a member of staff cannot be
both a Manager and an Assistant simultaneously.
13
Query Decomposition:
stages cont’d..
4. Simplification
 The objectives of the simplification stage are to
detect redundant qualifications.
 Eliminate common subexpressions.
 Transform the query to a semantically equivalent
but more easily and efficiently computed form.
For example: From Boolean algebra
p ∧ (p) ≡ p p ∨ (p) ≡ p
p ∧ false ≡ false p ∨ false ≡ p
p ∧ true ≡ p p ∨ true ≡ true
p ∧ (~p) ≡ false p ∨ (~p) ≡ true
14
Query Decomposition:
stages cont’d..
5. Query restructuring
 The query is restructured to provide a more
efficient implementation.

15
Heuristical Approach to
Query Optimization
 Uses transformation rules to convert one relational
algebra expression into an equivalent form.
 That is known to be more efficient.
Transformation Rules for the Relational
Algebra Operations
 By applying transformation rules, the optimizer can
transform one relational algebra expression into an
equivalent expression.
 In listing these rules, we use three relations R, S, and T,
with R defined over the attributes A = {A1, A2, . . . , An},
and S defined over B = {B1, B2, . . . , Bn}; p, q, and r
denote predicates, and L, L1, L2, M, M1, M2, and N denote
sets of attributes.
16
Heuristical Approach…
cont’d…
1. Conjunctive Selection operations can cascade
into individual Selection operations (and vice
versa).
 This transformation is sometimes referred to as
cascade of selection.
 σp∧ q∧ r(R)= σp (σ q (σ r(R)))
E.g.
 σbranchNo=‘B003’
∧salary>15000(Staff)=σbranchNo=‘B003’(σsalary>15000(Staff))

17
Heuristical Approach…
cont’d…
2. Commutativity of Selection operations
 σp (σ q (R))=σq (σ p (R))
E.g.
 σbranchNo=‘B003’(σsalary>15000 (Staff))=σsalary>15000
(σbranchNo=‘B003’(Staff))

18
Heuristical Approach…
cont’d…
3. In a sequence of Projection operations,

ΠLΠM ...ΠN(R) = ΠL(R)


only the last in the sequence is required

E.g.

 Π
lNameΠbranchno, lName(Staff) = ΠlName(Staff)

19
Heuristical Approach…
cont’d…
4. Commutativity of Selection and Projection.
 If the predicate p involves only the attributes in
the projection list, then the Selection and

ΠA1, . . . , Am(σp(R)) = σp(Π A1, . . . , Am(R)) where p ∈


Projection operations commute:

{A1, A2, . . . , Am}


ΠfName, IName(σIName =‘Beech’(staff)) = σIName =‘Beech’(ΠfName,


E.g.

Iname (staff))

20
Heuristical Approach…
cont’d…
5. Commutativity of Theta join (and Cartesian
product).
 R pS = S pR
 R× S= S× R

E.g.
Staff Staff.branchNo=Branch.branchNo
Branch= Branch
Staff.branchNo=Branch.branchNo
Staff

21
Heuristical Approach…
cont’d…

22
Heuristical Approach…
cont’d…

23
Heuristical Approach…
cont’d…

24
Heuristical Approach…
cont’d…

25
Heuristical Approach…
cont’d…

26
Heuristical Processing
Strategies
 Perform Selection operations as early as possible.
 Combine the Cartesian product with a subsequent
Selection operation whose predicate represents a
join condition into a Join operation.
 Use associativity of binary operations to rearrange
leaf nodes so that the leaf nodes with the most
restrictive Selection operations are executed first.
 Perform Projection operations as early as possible.
 Compute common expressions once.

27
Heuristical Query
optimization: Example
 Consider the following table :
Employee (Fname, Mname, Lname, Ssn, Bdate,
Address, Gender, Salary, Superssn,Dno)
Project (Pname, Pnumber, Plocation, Dnum)
Works_On (Essn, Pno, Hours)

28
Heuristical Query
optimization: Example
 Query Q on this table find the last names of
employees born after 1957 who work on a
project named ‘Aquarius’.
 This query can be specified in SQL as follows:

Q: SELECT Lname
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE Pname=‘Aquarius’ AND Pnumber=Pno
AND Essn=Ssn
AND Bdate > ‘1957-12-31’;

29
Heuristical Query
optimization: Example
Simplified steps in converting a query tree
during heuristic optimization
1.Initial (canonical) query tree for SQL query Q.
2.Moving SELECT operations down the query tree.
3.Applying the more restrictive SELECT operation
first.
4. Replacing CARTESIAN PRODUCT and SELECT with
JOIN operations.
5.Moving PROJECT operations down the query tree.

30
Heuristical Query
optimization: Example
1. Initial (canonical) query tree for SQL query Q.
SELECT Lname
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE Pname=‘Aquarius’ AND Pnumber=Pno AND
Essn=Ssn
AND Bdate > ‘1957-12-31’;

31
Heuristical Query
optimization: Example
2. Moving SELECT operations down the query tree.

32
Heuristical Query
optimization: Example
3. Applying the more restrictive SELECT operation first.

33
Heuristical Query
optimization: Example
4. Replacing CARTESIAN PRODUCT and SELECT with JOIN
operations.
σR.a=S.b RXS=R R.a=S.b S

34
Heuristical Query
optimization: Example
5. Moving PROJECT operations down the query tree.

Employee (Fname, Mname,


Lname, Ssn, Bdate,
Address, Gender, Salary,
Superssn,Dno)
Project (Pname, Pnumber,
Plocation, Dnum)
Works_On (Essn, Pno,
Hours) 35

You might also like