Distributed Cost Model
Distributed Cost Model
Distributed Cost Model
Fall 2012
Basic Concepts
Database Statistics
DDBS12, SL05
1/52
M. Bohlen
Basic Concepts/1
I
I
I
DDBS12, SL05
M. Bohlen
Basic Concepts/2
I
DDBS12, SL05
3/52
M. Bohlen
Basic Concepts/3
I
DDBS12, SL05
4/52
M. Bohlen
Basic Concepts/4
I
Deterministic
Randomized
DDBS12, SL05
Start from base relations and build plans by adding one relation at
each step
Breadth-first strategy (BFS): build all possible plans before choosing
the best plan (dynamic programming approach)
Depth-first strategy (DFS): build only one plan (greedy approach)
5/52
M. Bohlen
Basic Concepts/5
I
DDBS12, SL05
Does not guarantee that the best solution is obtained, but avoid the
high cost of optimization
The strategy is better when more than 5-6 relations are involved
6/52
M. Bohlen
DDBS12, SL05
7/52
M. Bohlen
DDBS12, SL05
M. Bohlen
DDBS12, SL05
9/52
M. Bohlen
I
I
I
DDBS12, SL05
10/52
M. Bohlen
Database Statistics/1
DDBS12, SL05
11/52
M. Bohlen
Database Statistics/2
I
I
DDBS12, SL05
12/52
M. Bohlen
Database Statistics/3
I
card (A (R ))
max(A ) value
SF (A > value ) =
max(A ) min(A )
value min(A )
SF (A < value ) =
max(A ) min(A )
DDBS12, SL05
13/52
M. Bohlen
Database Statistics/4
DDBS12, SL05
14/52
M. Bohlen
Database Statistics/5
I
Selection
card (P (R )) = SF (P ) card (R )
Projection
I
I
card (A (R )) = card (R )
I
Cartesian Product
card (R S ) = card (R ) card (S )
Union
I
I
Set Difference
I
I
DDBS12, SL05
M. Bohlen
Database Statistics/6
I
card (R Z S )
card (R ) card (S )
Cardinality of joins
I
card (R ZR .A =S .A S ) = card (S )
DDBS12, SL05
16/52
M. Bohlen
Database Statistics/7
card (A (S ))
card (dom[A ])
DDBS12, SL05
17/52
M. Bohlen
DDBS12, SL05
18/52
M. Bohlen
DDBS12, SL05
19/52
M. Bohlen
DDBS12, SL05
20/52
M. Bohlen
Plan 1:
EMPSite 2
Site 2: EMP=EMPZASG
EMPSite 3
Site 3: EMPZPROJ
Plan 2:
ASGSite 1
Site 1: EMP=EMPZASG
EMPSite 3
Site 3: EMPZPROJ
Plan 4:
PROJSite 2
Site 2: PROJ=PROJZASG
PROJSite 1
Site 1: PROJZEMP
Plan 3:
ASGSite 3
Site 3: ASG=ASGZPROJ
ASGSite 1
Site 1: ASGZEMP
Plan 5:
EMPSite 2
PROJSite 2
Site 2: EMPZPROJZASG
DDBS12, SL05
21/52
M. Bohlen
DDBS12, SL05
22/52
M. Bohlen
R ZA (S B<A R )
(R B<A S ) ZA (S B<A R )
I
DDBS12, SL05
23/52
M. Bohlen
sl06.2
R Site 2
Site 2 computes R Z S
S 0 = A (S )
S 0 Site 1
Site 1 computes R 0 = R B< S 0
R 0 Site 2
Site 2 computes R 0 Z S
DDBS12, SL05
24/52
M. Bohlen
INGRES Algorithm/1
DDBS12, SL05
25/52
M. Bohlen
INGRES Algorithm/2
I
The query
q: SELECT
FROM
WHERE
AND
R2 .A2 , . . . , Rn .An
R1 , R2 , . . . , Rn
P1 (R1 .A10 )
P2 (R1 .A1 , . . . , Rn .An )
SELECT
FROM
WHERE
R2 .A2 , . . . , Rn .An
R10 , R2 , . . . , Rn
P2 (R10 .A1 , . . . , Rn .An )
Detachment reduces the size of the relation on which the query q00
is defined.
DDBS12, SL05
26/52
M. Bohlen
INGRES Algorithm/3
I
DDBS12, SL05
SELECT
FROM
WHERE
AND
EMP.ENAME
EMP, ASG, JVAR
EMP.ENO = ASG.ENO
ASG.PNO = JVAR.PNO
27/52
M. Bohlen
INGRES Algorithm/4
I
I
I
I
SELECT
INTO
FROM
WHERE
ASG.ENO
GVAR
ASG, JVAR
ASG.PNO=JVAR.PNO
q13 :
SELECT
FROM
WHERE
EMP.ENAME
EMP, GVAR
EMP.ENO=GVAR.ENO
DDBS12, SL05
28/52
M. Bohlen
INGRES Algorithm/5
I
DDBS12, SL05
SELECT
FROM
WHERE
EMP.ENAME
EMP
EMP.ENO = E1
29/52
M. Bohlen
INGRES Algorithm/6
Example (contd.):
q132 :
DDBS12, SL05
SELECT
FROM
WHERE
EMP.ENAME
EMP
EMP.ENO = E2
30/52
M. Bohlen
sl06.1
DDBS12, SL05
31/52
M. Bohlen
System R Algorithm/1
I
DDBS12, SL05
32/52
M. Bohlen
System R Algorithm/2
I
Join graph
Indexes
I
I
I
DDBS12, SL05
33/52
M. Bohlen
System R Algorithm/3
DDBS12, SL05
34/52
M. Bohlen
System R Algorithm/4
I
sl06.4
Example (contd.): Step 2 Select the best join ordering for each
relation
I
I
(EMP PROJ) and (PROJ EMP) are pruned because they are CPs
(ASG Z PROJ) pruned because (we assume) it has higher cost than
(PROJ Z ASG); similar for (ASG Z EMP)
Best total join order ((PROJZ ASG)Z EMP), since it uses the indexes
best
I
I
I
DDBS12, SL05
M. Bohlen
Join ordering and data transfer between different sites are the most
critical issues to be considered by the master site
DDBS12, SL05
36/52
M. Bohlen
Ship whole: The entire relation is shipped to the join site and stored
in a temporary relation
I
I
I
DDBS12, SL05
37/52
M. Bohlen
Notation:
I
I
I
R is outer relation
S is inner relation
LT denotes local processing time
CT denotes communication time
s denotes the average number of S-tuples that match an R-tuple
Strategy 1: Ship the entire outer relation to the site of the inner
relation, i.e.,
I
I
I
38/52
M. Bohlen
Strategy 2: Ship the entire inner relation to the site of the outer
relation. We cannot join as they arrive; they need to be stored.
I
DDBS12, SL05
39/52
M. Bohlen
DDBS12, SL05
40/52
M. Bohlen
sl06.6
sl06.7
Strategy 4: Move both relations to a third site and compute the join
there.
I
DDBS12, SL05
41/52
M. Bohlen
Hill-Climbing Algorithm/1
I
I
I
DDBS12, SL05
42/52
M. Bohlen
Hill-Climbing Algorithm/2
I
ES1: send one of the relations involved in the join to the other relations
site
ES2: send the join result to the final result site
43/52
M. Bohlen
Hill-Climbing Algorithm/3
I
DDBS12, SL05
M. Bohlen
Hill-Climbing Algorithm/4
I
I
DDBS12, SL05
M. Bohlen
Hill-Climbing Algorithm/5
I
Alternative 1: ES1,
ES2, ES3
I
I
ES1: EMPSite 2
ES2: (EMPZPAY)
Site4
ES3: PROJSite 4
Alternative 2: ES1,
ES2, ES3
I
= 8 + 8 + 1 = 17
= 4 + 8 + 1 = 13
Both alternatives are not better than ES0, so keep ES0 (or take
alternative 2 which has the same cost)
DDBS12, SL05
46/52
M. Bohlen
Hill-Climbing Algorithm/6
Problems
I
I
I
sl06.5
DDBS12, SL05
47/52
M. Bohlen
SDD-1
I
Considers semijoins
I
I
I
I
DDBS12, SL05
size (R )
1500
3000
2000
A
R1.A
R2.A
R2.B
R3.B
SF |><
0.3
0.8
1.0
0.4
48/52
size (A )
36
320
400
80
M. Bohlen
SDD-1 Algorithm
Step 1 Include all local processing in the execution strategy ES.
Step 2 Update database profile with effects of local processing.
Step 3 Determine beneficial
|><
|><
49/52
M. Bohlen
Conclusion
I
DDBS12, SL05
50/52
M. Bohlen
Course Project
I
I
problem definition
running example
description of solution
evaluation
strength, weaknesses, limitations
DDBS12, SL05
51/52
M. Bohlen
Course Exam
oral, 20 minutes
10 minutes about project (demo, code, algorithm)
10 about a topic of the course
DDBS12, SL05
52/52
M. Bohlen