Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Query Processing and Optimization PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 73

Lecture 3: Query processing and

optimization
Objectives:
Reduce processing time
Reduce buffer memory
Reduce communication cost between sites
Low resource usage

Hoa Dinh Nguyen, PhD.


PTIT 1 Department of Information Technology
Introduction

 Functions of query processing:


www.ptit.edu.vn

• Transform a complicated query into a much


simpler query.
• This transformation must ensure correctness
and efficiency
• Each transformation method leads to different
resource usages → lowest resource usage.

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 2 Department of Information Technology
Introduction
Simple transformation methods
Relational algebra:
www.ptit.edu.vn

 Simplify queries based on equivalent relational algebra


expressions such that the querying time is minimized.
 This method disregards database structure and size.
Cost estimation:
 Specify data size and processing time of each element in the
query.
 This method takes into account the data size and real
processing time of the query.

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 3 Department of Information Technology
Querying Process

 Procedure of query execution


 Query pre-processing
www.ptit.edu.vn

 Query transformation
 Query optimization

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 4 Department of Information Technology
www.ptit.edu.vn Querying Process

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 5 Department of Information Technology
Querying Process

 Query preprocessor:
 Scanning: key words, attribute names,
www.ptit.edu.vn

relations,…
 Parsing: syntax checking, parse tree
representation
 Validating: semantic checking (relations,
attributes, data types)

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 6 Department of Information Technology
Querying Process

 Query optimizer:
 Select suitable strategy for query processing
www.ptit.edu.vn

 Query code generator:


 Generate codes to implement the plan
 Runtime database processor:
 Compile codes to provide query results

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 7 Department of Information Technology
www.ptit.edu.vn Querying Process

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 8 Department of Information Technology
Querying Process
Transform in to relational algebra expressions
 Query blocks: SELECT-FROM-WHERE-
www.ptit.edu.vn

GROUP BY-HAVING
 Integrated queries: separate into query blocks

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 10 Department of Information Technology
Querying Process
Transform in to relational algebra expressions
SELECT Name, Salary
www.ptit.edu.vn

FROM Staff
WHERE Salary > (SELECT AVG(Salary)
FROM Staff
WHERE Gender = “Male”)

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 11 Department of Information Technology
Query Processing in Centralized DBS
Comparison between centralized and distributed query
processing:
Centralized:
www.ptit.edu.vn

 Select the best relational algebra expression equivalent to the


query
 Query processing strategies are described using relational algebra
extensions
Distributed:
 Inherited from centralized environment
 More issues to concern:
 Math expressions of data transmission between sites
 Choose the best site to process data
 Data transmission methods
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 12 Department of Information Technology
SQL

Syntax checking Optimization in RAE


www.ptit.edu.vn

Query with correct syntax Optimized RAE

Suitability checking Select optimized solution

Resonable SQL Implementation planning

Compile SQL query Generate codes

Relational algebra expression Query codes


Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 13 Department of Information Technology
Query on local relations

Global
Query decomposition Schema
Algebraic query on global relations
Control Fragment
www.ptit.edu.vn

Site Data localization Schema


Algebraic query on fragments

Local
Global Optimization Schema
Distributed query execution plan

Local sites Distributed execution


Optimized local queries

Generic layering scheme for distributed query processing 14

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 14 Department of Information Technology
Query Processing in Centralized DBS
Optimization strategies in centralized databases:
 Distributed queries must be composed and processed in
a centralized manner
www.ptit.edu.vn

 All distributed query optimization techniques are


extensions from centralized query processing
approaches
 Centralized query optimization is often simple

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 15 Department of Information Technology
Query Processing in Centralized DBS
INGRES algorithm: recursively breaks up a query expressed
in SQL into smaller pieces which are executed along the way
 The query is first decomposed into a sequence of queries
www.ptit.edu.vn

having a unique relation in common


 Each mono-relation query is processed by selecting, based
on the predicate, the best access method to that relation
E.g: For example, if the predicate is of the form A = value, an
index available on attribute A would be used if it exists.
However, if the predicate is of the form A ≠ value, an index
on A would not help, and sequential scan should be used.

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 16 Department of Information Technology
Query Processing in Centralized DBS
INGRES algorithm:
 Executes first the unary (monorelation) operations and
tries to minimize the sizes of intermediate results in
www.ptit.edu.vn

ordering binary (multirelation) operations


 Denote by qi-1qi a query q decomposed into two
subqueries, qi-1 and qi, where qi-1 is executed first and its
result is consumed by qi
 Decomposes q into n subqueries q1q2 …qn using
detachment and substitution.

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 17 Department of Information Technology
Query Processing in Centralized DBS
Detachment: breaks a query q into q’q’’, based on a
common relation that is the result of q’
q: SELECT R2.A2, R3.A3,. . ., Rn.An
www.ptit.edu.vn

FROM R1, R2,. . . , Rn


WHERE P1(R1.A’1) AND P2(R1.A1, R2.A2, . . . , Rn.An);
q’: SELECT R1A1 INTO R’1
FROM R1
WHERE P1(R1.A1);
q”:SELECT R2A2,. . ., RnAn
FROM R’1, R2,. . . , Rn
WHERE P2(R1.A1, R2.A2,. . ., Rn.An);
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 18 Department of Information Technology
EMP (E) ASG (G)
ENO ENAME TITLE ENO PNO RESPONSIBILITY DUR
A1 Nam Phân tích HT A1 D1 Quản lý 12
A2 Trung Lập trình viên A2 D1 Phân tích 34
A3 Đông Phân tích HT A2 D2 Phân tích 6
A4 Bắc Phân tích HT A3 D3 Kỹ thuật 12
www.ptit.edu.vn

A5 Tây Lập trình viên A3 D4 Lập trình 10


A6 Hùng Kỹ sư điện A4 D2 Quản lý 6
A7 Dũng Phân tích HT A5 D2 Quản lý 20
A8 Chiến Thiết kế DL A6 D4 Kỹ thuật 36
A7 D3 Quản lý 48
A8 D3 Lập trình 15

PRJ (J) PAY(S)


PNO PNAME BUDGET TITLE SALARY
D1 CSDL 20000 Kỹ sư điện 1000
D2 CÀI ĐẶT 12000 Phân tích HT 2500
D3 BẢO TRÌ 28000 Lập trình viên 3000
D4 PHÁT TRIỂN 25000 Thiết kế DL 4000

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 19 Department of Information Technology
Example
q1=“Find names of staffs working on project CSDL”
q 1: SELECT E.ENAME
FROM E, G, J
www.ptit.edu.vn

WHERE E.ENO = G.ENO AND G.PNO = J.PNO


AND PNAME = “CSDL”;
q1 is detached into q11q’, where TGIAN1 is an intermediate relation.
q11: SELECT J.PNO INTO TEMP1
FROM J
WHERE PNAME = “CSDL”;
q’: SELECT E.ENAME
FROM E, G, TEMP1
WHERE E.ENO = G.ENO
AND G.PNO = TEMP1.PNO;
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 20 Department of Information Technology
Example
The successive detachments of q’ may generate
q12: SELECT G.ENO INTO TEMP2
FROM G, TEMP1
www.ptit.edu.vn

WHERE G.PNO =TEMP1.PNO;


q13: SELECT E.ENAME
FROM E, TEMP2
WHERE E.ENO = TEMP2.ENO;
q1 has been decomposed into q11q12q13
Query q11 is mono-relation and can be executed. However,
q12 and q13 are not mono-relation and cannot be reduced by
detachment.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 21 Department of Information Technology
Query Processing in Centralized DBS
Tuple substitution: Given an n-relation query q, the tuples of
one relation are substituted by their values, thereby producing
a set of (n-1)-relation queries.
www.ptit.edu.vn

 One relation in q is chosen for tuple substitution. Let R1 be that


relation.
 For each tuple t1i ∈ R1, the attributes referred to by in q are replaced
by their actual values in t1i, thereby generating a query q’ with n-1
relations.
q(R1,R2,...,Rn) is replaced by {q’(t1i,R2,R3, ... ,Rn),t1i ∈ R1}

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 22 Department of Information Technology
Query Processing in Centralized DBS
Example: Let’s consider the query q13
q13: SELECT E.ENAME
FROM E, TEMP2
www.ptit.edu.vn

WHERE E.ENO = TEMP2.ENO ;


The relation TEMP2 is over a single attribute (ENO). Assume that it
contains only two tuples: <E1> and <E2>. The substitution of TEMP2
generates two one-relation subqueries:
q131: SELECT E.ENAME
FROM E
WHERE E.ENO = “E1”;
q132: SELECT E.ENAME
FROM E
WHERE E.ENO = “E2”;
These queries may then be executed
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 23 Department of Information Technology
INGRES- QOA Algorithm
Input: MRQ: multirelation query with n relations
Output: Result of execution
Begin
Output 
If n=1 then
Output  run(MRQ)
www.ptit.edu.vn

Else
{ORQ1, ..., ORQm, MRQ’} MRQ
For i1 to m
Output’  run(ORQi)
Output  output  output’
Endfor
R  CHOOSE_ RELATION(MRQ’)
For each tuple t  R
MRQ”  substitute values for t in MRQ’
Output’  INGRES-QOA(MRQ”)
Output  output  output’
Endfor
Endif
End.
Lecture 3: {INGRES-QOA}
Query processing and optimization 24 Hoa Dinh Nguyen, PhD.
Department of Information Technology
Query Decomposition
Normalization: to transform the query to a
normalized form to facilitate further processing.
www.ptit.edu.vn

 Conjunctive normal form:


(p11 p12 ...  p1n)  ...  (pm1 pm2 ...  pmn)
 Disjunctive normal form
(p11  p12  ...  p1n)  ...  (pm1  pm2  ... pmn)
where pij is a simple predicate

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 25 Department of Information Technology
Query Decomposition
Equivalence rules
 p1  p2 ⟺ p2  p1
 p1  p2 ⟺ p2  p1
www.ptit.edu.vn

 p1  (p2  p3) ⟺ (p1  p2 )  p3


 p1  (p2  p3) ⟺ (p1  p2)  p3
 p1  (p2  p3) ⟺ (p1  p2)  (p1  p3)
 p1  (p2  p3) ⟺ (p1  p2 )  (p1  p3)
 ¬(p1  p2) ⟺ ¬p2  ¬p1
 ¬(p1  p2) ⟺ ¬p2  ¬p1
 ¬(¬p) ⟺ p
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 26 Department of Information Technology
Query Decomposition
Example: “Find the names of employees who have been
working on project P1 for 12 or 24 months”
SELECT ENAME
www.ptit.edu.vn

FROM EMP, ASG


WHERE EMP.ENO = ASG.ENO
AND ASG.PNO = "P1"
AND (DUR = 12 OR DUR = 24);
conjunctive normal form is
EMP.ENO=ASG.ENO  ASG.PNO = “P1”  (DUR =12  DUR=24)
disjunctive normal form is
(EMP.ENO = ASG.ENO  ASG.PNO = “P1”  DUR = 12) 
(EMP.ENO = ASG.ENO  ASG.PNO = “P1”  DUR = 24)
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 27 Department of Information Technology
Query Decomposition
Analysis: enables rejection of normalized queries for
which further processing is either impossible or
www.ptit.edu.vn

unnecessary.
 A query is type incorrect: if any of its attribute or
relation names are not defined in the global schema,
or if operations are being applied to attributes of the
wrong type.
 A query is semantically incorrect if its components
do not contribute in any way to the generation of the
result.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 28 Department of Information Technology
Query Decomposition
Example: This query is type incorrect
SELECT E#
www.ptit.edu.vn

FROM EMP
WHERE ENAME > 200;
For 2 reasons:
 Attribute E# is not declared in the schema
 the operation “>200” is incompatible with the type
string of ENAME.

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 29 Department of Information Technology
Query Decomposition
 In the context of relational calculus, it is not possible
to determine the semantic correctness of general
www.ptit.edu.vn

queries.
 It is possible to do so based on the representation of
the query as a graph, called a query graph or
connection graph

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 30 Department of Information Technology
Query Decomposition
Query graph or connection graph
 One node indicates the result relation, and any other node
indicates an operand relation
www.ptit.edu.vn

 An edge between two nodes one of which does not


correspond to the result represents a join, whereas an edge
whose destination node is the result represents a project
 A non-result node may be labeled by a select or a self-join
(join of the relation with itself) predicate
 An important subgraph of the query graph is the join
graph, in which only the joins are considered

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 31 Department of Information Technology
Query Decomposition
Example: “Find the names and responsibilities of programmers
who have been working on the CAD/CAM project for more
than 3 years.”
www.ptit.edu.vn

SELECT ENAME, RESP


FROM E, G, J
WHERE E.ENO = G.ENO
AND G.PNO = J.PNO
AND PNAME = "CAD/CAM"
AND DUR > 36
AND TITLE = “Programmer”;
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 32 Department of Information Technology
Query Decomposition
DUR 36

E.ENO=G.ENO G.PNO=J.PNO
www.ptit.edu.vn

TITLE= “Programer” E G.RESPONSIBILITY J


PNAME=”CAD/CAM”
E.ENAME
Result
(a) Query graph

G
G.ENO=G.ENO G.PNO=J.PNO

E J
(b) Joint graph

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 33 Department of Information Technology
Query Decomposition
Example: Consider the following query
SELECT ENAME, RESP
www.ptit.edu.vn

FROM E, G, J
WHERE E.ENO = G.ENO
AND PNAME = "CAD/CAM"
AND DUR > 36
AND TITLE = "Programmer”;

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 34 Department of Information Technology
Query Decomposition
 Its query graph is disconnected, which tells us that the query
is semantically incorrect.
www.ptit.edu.vn

DUR 36

E.ENO=G.ENO

TITLE= “Programer” E G.RESPONSIBILITY J


PNAME=”CAD/CAM”
E.ENAME
Result

Query graph

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 35 Department of Information Technology
Query Decomposition
Solutions to this problem:
 Reject the query
www.ptit.edu.vn

 Assume that there is an implicit Cartesian product


between relations G and J,
 Infer (using the schema) the missing join predicate
G.PNO = J.PNO which transforms the query into
that of previous Example.

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 36 Department of Information Technology
Query Decomposition
 Elimination of Redundancy: The enriched query
qualification may contain redundant predicates. A
www.ptit.edu.vn

naive evaluation of a qualification with redundancy


can well lead to duplicated work. Such redundancy
and thus redundant work may be eliminated by
simplifying the qualification with the following well-
known idempotency rules:

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 37 Department of Information Technology
Query Decomposition
1. p  p p 2. p  true  true
3. p  p p 4. p   p  false
www.ptit.edu.vn

5. p  true  p 6. p   p  true
7. p  false  p 8. p1  (p1  p2)  p1
9. p  false  false 10.p1  (p1  p2)  p1

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 38 Department of Information Technology
Query Decomposition
Example: the SQL query
SELECT TITLE
www.ptit.edu.vn

FROM E
WHERE (NOT (TITLE = "Programmer")
AND (TITLE = "Programmer" OR TITLE = "Elect. Eng.")
AND NOT (TITLE = "Elect. Eng."))
OR ENAME = “J. Doe”;

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 39 Department of Information Technology
Query Decomposition
Let:
p1: TITLE = “Programmer”,
www.ptit.edu.vn

p2: TITLE = “Elect. Eng.”,


p3: ENAME = “J. Doe”.
The query application is
( p1  (p1  p2)   p2)  p3
 (( p1  p1   p2)  ( p1  p2   p2))  p3
 ((false   p2)  ( p1  false) ) p3
 (false  false )  p3
 p3
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 40 Department of Information Technology
Query Decomposition
The query can be simplified using the previous rules to
become
www.ptit.edu.vn

SELECT TITLE
FROM E
WHERE ENAME = "J. Doe";

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 41 Department of Information Technology
Query Decomposition
Rewriting: rewrites the query in relational algebra. it
is customary to represent the relational algebra query
www.ptit.edu.vn

graphically by an operator tree.


 An operator tree is a tree in which a leaf node is a
relation stored in the database, and a non-leaf node is
an intermediate relation produced by a relational
algebra operator.
 The sequence of operations is directed from the
leaves to the root, which represents the answer to
the query.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 42 Department of Information Technology
Query Decomposition
The transformation of a tuple relational calculus query into an
operator tree can easily be achieved as follows
 A different leaf is created for each different tuple variable
www.ptit.edu.vn

(corresponding to a relation). In SQL, the leaves are


immediately available in the FROM clause.
 The root node is created as a project operation involving the
result attributes. These are found in the SELECT clause in
SQL.
 The qualification (SQL WHERE clause) is translated into the
appropriate sequence of relational operations (select, join,
union, etc.) going from the leaves to the root.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 43 Department of Information Technology
Query Decomposition
Example: “Find the names of employees other than J. Doe who
worked on the CAD/CAM project for either one or two
years”
www.ptit.edu.vn

SELECT ENAME
FROM PROJ, ASG, EMP
WHERE ASG.ENO = EMP.ENO
AND ASG.PNO = PROJ.PNO
AND ENAME != "J. Doe"
AND PROJ.PNAME = "CAD/CAM"
AND (DUR = 12 OR DUR = 24) ;
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 44 Department of Information Technology
www.ptit.edu.vn Query Decomposition

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 45 Department of Information Technology
Query Decomposition
1. Commutativity of binary operators
R×SS×R
www.ptit.edu.vn

R⋈SS⋈R
2. Associativity of binary operators
R × (S × T)  (R × S) × T
R ⋈ (S ⋈ T)  (R ⋈ S) ⋈ T
3. Idempotence of unary operators
A’(A’’(R))   A’(R) , where A’, A’’ R and A’  A’’
 p ( A ) ( p
1 1 2 ( A2 )
( R))   p1 ( A1 ) p2 ( A2 ) ( R)
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 46 Department of Information Technology
Query Decomposition
4. Commuting selection with projection
 A1 ,...,An ( p ( Ap ) ( R))   A1 ,...,An ( p ( Ap ) ( A1 ,...,An , Ap ( R)))
www.ptit.edu.vn

 Note that if Ap is already a member of {A1,…,An}, the last


projection on [A1,…, An] on the right-hand side of the
equality is useless.

 A1 ,...,An ( p ( Ap ) ( R))   p ( Ap ) ( A1 ,...,An , Ap ( R))

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 47 Department of Information Technology
Query Decomposition
5. Commuting selection with binary operators

 p ( A ) ( R  S )   p ( A ) ( R)  S
www.ptit.edu.vn

p p

 p( A ) (R
i p ( Ai , Bk )
S )   p ( Ai ) ( R ) p ( Ai , Bk ) S

 p ( A ) ( R  T )   p ( A ) ( R)   p ( A ) (T )
i i i

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 48 Department of Information Technology
Query Decomposition
6. Commuting projection with binary operators
 If C=A’B’, where A’ A, B’ B, and A, B are the sets of
attributes over which relations R and S, respectively, are
www.ptit.edu.vn

defined, we have
C ( R  S )   A' ( R)   B ' (S )

C ( R p(Ai , B j ) S )   A ' ( R) p(Ai ,B j )


 B' (S )

C ( R  S )   A' ( R)   B ' (S )

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 49 Department of Information Technology
Query Decomposition
These rules can be used in four different ways:
 First, they allow the separation of the unary operations,
simplifying the query expression.
www.ptit.edu.vn

 Second, unary operations on the same relation may be


grouped so that access to a relation for performing unary
operations can be done only once.
 Third, unary operations can be commuted with binary
operations so that some operations (e.g., selection) may be
done first.
 Fourth, the binary operations can be ordered.

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 50 Department of Information Technology
www.ptit.edu.vn Query Decomposition

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 51 Department of Information Technology
www.ptit.edu.vn Query Decomposition

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 52 Department of Information Technology
Localization of Distributed Data
 General techniques for decomposing and restructuring
queries apply to both centralized and distributed DBMSs and
do not take into account the distribution of data.
www.ptit.edu.vn

 Localization layer translates an algebraic query on global


relations into an algebraic query expressed on physical
fragments
 Localization uses information stored in the fragment schema
 A global relation can be reconstructed by applying the
reconstruction (or reverse fragmentation) rules and deriving
a relational algebra program whose operands are the
fragments. We call this a localization program.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 53 Department of Information Technology
Localization of Distributed Data
 A naive way to localize a distributed query is to generate a
query where each global relation is substituted by its
localization program.
www.ptit.edu.vn

 Localized query: replacing the leaves of the operator tree of


the distributed query with subtrees corresponding to the
localization programs
 Reduction techniques: generate simpler and optimized
queries by using the transformation rules and the heuristics,
such as pushing unary operations down the tree

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 54 Department of Information Technology
Reduction for PHF
 The horizontal fragmentation function distributes a relation
based on selection predicates
Example: Relation EMP(ENO, ENAME, TITLE) can be split
www.ptit.edu.vn

into three horizontal fragments EMP1, EMP2, and EMP3


1. EMP1 = 𝜎𝐸𝑁𝑂≤"𝐸3" (EMP)
2. EMP2 = 𝜎"𝐸3"<𝐸𝑁𝑂≤"𝐸6" (EMP)
3. EMP3 = 𝜎𝐸𝑁𝑂>"𝐸6" (EMP)
 The localization program for an horizontally fragmented
relation is the union of the fragments
EMP = EMP1 EMP2  EMP3
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 55 Department of Information Technology
Reduction with Selection
 Selections on fragments that have a qualification
contradicting the qualification of the fragmentation rule
generate empty relations.
www.ptit.edu.vn

 Given a relation R that has been horizontally fragmented as


R1, R2, …, RN, where Ri = 𝜎𝑝𝑖 (R), the rule can be stated
formally as follows:
 Rule 1:𝜎𝑝𝑗 𝑅𝑖 = ∅ if ∀𝑥 in R: ¬ 𝑝𝑖 𝑥 ⋀𝑝𝑗 𝑥
where pi and pj are selection predicates, x denotes a tuple, and
p(x) denotes “predicate p holds for x.”

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 56 Department of Information Technology
Reduction with Selection
 Example:
SELECT *
www.ptit.edu.vn

FROM EMP
WHERE ENO = “E5” ;

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 57 Department of Information Technology
Reduction with Join
 Joins on horizontally fragmented relations can be simplified
when the joined relations are fragmented according to the
join attribute.
www.ptit.edu.vn

 The simplification consists of distributing joins over unions


and eliminating useless joins
𝑅1 ∪ 𝑅2 ⋈ 𝑆 = 𝑅1 ⋈ 𝑆 ∪ 𝑅2 ⋈ 𝑆
where Ri are fragments of R and S is a relation.
 Rule 2: 𝑅𝑖 ⋈ 𝑅𝑗 = ∅ if ∀𝑥 in Ri, ∀𝑦 in Rj :
¬ 𝑝𝑖 𝑥 ⋀𝑝𝑗 𝑦
fragments Ri and Rj are defined, respectively, according to
predicates pi and pj on the same attribute
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 58 Department of Information Technology
Reduction with Join
 Example: Assume that relation EMP is fragmented between
EMP1, EMP2, and EMP3, as above, and that relation ASG is
fragmented as
www.ptit.edu.vn

1. ASG1 = 𝜎𝐸𝑁𝑂≤"𝐸3" (ASG)


2. ASG2 = 𝜎𝐸𝑁𝑂>"𝐸3" (ASG)
 Consider the join query:
SELECT *
FROM EMP, ASG
WHERE EMP.ENO = ASG.ENO;

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 59 Department of Information Technology
www.ptit.edu.vn Reduction with Join

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 60 Department of Information Technology
Reduction for Vertical Fragmentation
 The vertical fragmentation function distributes a relation
based on projection attributes
 The localization program for a VF relation consists of the
www.ptit.edu.vn

join of the fragments on the common attribute.


 Example: Relation EMP can be divided into two vertical
fragments where the key attribute ENO is duplicated:
1. EMP1 = ∏𝐸𝑁𝑂,𝐸𝑁𝐴𝑀𝐸 (EMP)
2. EMP2 = ∏𝐸𝑁𝑂,𝑇𝐼𝑇𝐿𝐸 (EMP)
 The localization program is
EMP = EMP1 ⋈𝐸𝑁𝑂 EMP2
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 61 Department of Information Technology
Reduction for Vertical Fragmentation
 Queries on vertical fragments can be reduced by
determining the useless intermediate relations and removing
the subtrees that produce them
www.ptit.edu.vn

 Projections on a vertical fragment that has no attributes in


common with the projection attributes (except the key of
the relation) produce useless, though not empty relations.
 Rule 3: ∏𝐷,𝐾 𝑅𝑖 is useless if the set of projection
attributes D is not in A’
Where relation R(A1, A2,…, An) is vertically fragmented as
𝑅𝑖 = ∏𝐴′ 𝑅 , and 𝐴′ ⊆ 𝐴1 , 𝐴2 , … , 𝐴𝑛

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 62 Department of Information Technology
Reduction for Vertical Fragmentation
 Example: consider the query
SELECT ENAME
www.ptit.edu.vn

FROM EMP

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 63 Department of Information Technology
Reduction for Derived Fragmentation
 If relation R is subject to derived horizontal fragmentation
due to relation S, the fragments of R and S that have the
same join attribute values are located at the same site.
www.ptit.edu.vn

 S can be fragmented according to a selection predicate.


 Derived fragmentation should be used only for one-to-many
(hierarchical) relationships of the form S → R, where a
tuple of S can match with n tuples of R, but a tuple of R
matches with exactly one tuple of S.
Note that derived fragmentation could be used for many-to-many
relationships provided that tuples of S (that match with n tuples of R)
are replicated.
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 64 Department of Information Technology
Reduction for Derived Fragmentation
 Example: Given a one-to-many relationship from EMP to
ASG, relation ASG(ENO, PNO, RESP, DUR) can be
indirectly fragmented according to the following rules:
www.ptit.edu.vn

1. ASG1 = ASG ⋈𝐸𝑁𝑂 EMP1


2. ASG2 = ASG ⋈𝐸𝑁𝑂 EMP2
Where: EMP1 = 𝜎𝑇𝐼𝑇𝐿𝐸="𝑃𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑒𝑟" (EMP)
EMP2 = 𝜎𝑇𝐼𝑇𝐿𝐸≠"𝑃𝑟𝑜𝑔𝑟𝑎𝑚𝑚𝑒𝑟" (EMP)
 The localization program for a horizontally fragmented
relation is the union of the fragments
ASG = ASG1  ASG2
Hoa Dinh Nguyen, PhD.
Lecture 3: Query processing and optimization 65 Department of Information Technology
Reduction for Derived Fragmentation
 Queries on derived fragments can also be reduced: certain
joins will produce empty relations if the fragmentation
predicates conflict.
www.ptit.edu.vn

 Example: the predicates of ASG1 and EMP2 conflict, thus


ASG1 ⋈ EMP2 = Ø
 The reduced query is always preferable to the localized
query because the number of partial joins usually equals the
number of fragments of R.

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 66 Department of Information Technology
Reduction for Derived Fragmentation
 Example:
SELECT *
FROM EMP, ASG
www.ptit.edu.vn

WHERE ASG.ENO = EMP.ENO


AND TITLE = "Mech. Eng."

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 67 Department of Information Technology
www.ptit.edu.vn Reduction for Derived Fragmentation

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 68 Department of Information Technology
www.ptit.edu.vn Reduction for Derived Fragmentation

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 69 Department of Information Technology
Reduction for Hybrid Fragmentation

 The goal of hybrid fragmentation is to efficiently support


queries involving projection, selection, and join.
www.ptit.edu.vn

 The optimization of an operation or of a combination of


operations is always done at the expense of other
operations.
 The localization program for a hybrid fragmented
relation uses unions and joins of fragments

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 70 Department of Information Technology
Reduction for Hybrid Fragmentation

 Example:
1. EMP1 = 𝜎𝐸𝑁𝑂≤"𝐸4" ∏𝐸𝑁𝑂,𝐸𝑁𝐴𝑀𝐸 𝐸𝑀𝑃
www.ptit.edu.vn

2. EMP2 = 𝜎𝐸𝑁𝑂>"𝐸4" ∏𝐸𝑁𝑂,𝐸𝑁𝐴𝑀𝐸 𝐸𝑀𝑃


3. EMP3 = ∏𝐸𝑁𝑂,𝑇𝐼𝑇𝐿𝐸 𝐸𝑀𝑃
The localization program is
𝐸𝑀𝑃 = 𝐸𝑀𝑃1 ∪ 𝐸𝑀𝑃2 ⋈𝐸𝑁𝑂 𝐸𝑀𝑃3

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 71 Department of Information Technology
Reduction for Hybrid Fragmentation

 Queries on hybrid fragments can be reduced by


combining the rules:
www.ptit.edu.vn

1. Remove empty relations generated by contradicting


selections on horizontal fragments.
2. Remove useless relations generated by projections on
vertical fragments.
3. Distribute joins over unions in order to isolate and
remove useless joins.

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 72 Department of Information Technology
Reduction for Hybrid Fragmentation
 Example: application of rules (1) and (2)
SELECT ENAME
FROM EMP
www.ptit.edu.vn

WHERE ENO="E5“

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 73 Department of Information Technology
Query Optimization in DDBS
 Query decomposition and data localization are the two
successive functions that map a calculus query, expressed on
distributed relations, into an algebraic query (query
www.ptit.edu.vn

decomposition), expressed on relation fragments (data


localization).
 Query decomposition can generate an algebraic query simply
by translating into relational operations the predicates and
the target statement as they appear.
 Data localization can, in turn, express this algebraic query on
relation fragments, by substituting for each distributed
relation an algebraic query corresponding to its
fragmentation rules.

Hoa Dinh Nguyen, PhD.


Lecture 3: Query processing and optimization 76 Department of Information Technology

You might also like