0% found this document useful (1 vote)

2K views

Distributed Cost Model

This document discusses distributed query optimization. It begins with basic concepts like centralized versus distributed query optimization and search space reduction techniques. It then covers distributed cost models that optimize for total time or response time. Next it discusses using database statistics and selectivity factors to estimate intermediate result sizes for operations. It describes considerations for join ordering when relations are fragmented across multiple sites. Finally, it discusses using semijoins to efficiently implement joins by reducing relation sizes before transferring them between sites.

Uploaded by

nenz187

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

2K views

Distributed Cost Model

Uploaded by

nenz187

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Distributed Database Systems

Fall 2012

Distributed Query Optimization

SL05

Basic Concepts

Distributed Cost Model

Database Statistics

Joins and Semijoins

Query Optimization Algorithms

DDBS12, SL05

1/52

M. Bohlen

Basic Concepts/1
I

Query optimization: Process of

producing an optimal (close to
optimal) query execution plan which
represents an execution strategy
I

Centralized query optimization:

I
I

The main task in query optimization

is to consider different orderings of
the operations
Find (the best) query execution plan
in space of equivalent query trees
Minimize an objective cost function
Gather statistics about relations

Distributed query optimization brings additional issues

I
I
I
I
I

DDBS12, SL05

Linear query trees are not necessarily a good choice

Bushy query trees are not necessarily a bad choice
What and where to ship the relations
How to ship relations (ship as a whole, ship as needed)
When to use semi-joins instead of joins
2/52

M. Bohlen

Basic Concepts/2
I

Search space: The set of alternative query execution plans (query

trees)
I
I
I

Typically very large

The main issue is to optimize joins
For N relations, there are O (N !) equivalent join trees that can be
obtained by applying commutativity and associativity rules

Example: 3 equivalent query trees (join trees) of the joins in the

following query
SELECT ENAME,RESP
FROM
EMP, ASG, PROJ
WHERE EMP.ENO=ASG.ENO AND ASG.PNO=PROJ.PNO

DDBS12, SL05

3/52

M. Bohlen

Basic Concepts/3
I

Reduction of the search space

Restrict by means of heuristics

Perform unary operations before binary operations, etc

Restrict the shape of the join tree

Consider the type of trees (linear trees vs. bushy trees)

Linear Join Tree

DDBS12, SL05

Bushy Join Tree

4/52

M. Bohlen

Basic Concepts/4
I

There are two main strategies to scan the search space

I
I

Deterministic
Randomized

Deterministic scan of the search space

DDBS12, SL05

Start from base relations and build plans by adding one relation at
each step
Breadth-first strategy (BFS): build all possible plans before choosing
the best plan (dynamic programming approach)
Depth-first strategy (DFS): build only one plan (greedy approach)

5/52

M. Bohlen

Basic Concepts/5
I

Randomized scan of the search space

I
I
I

Search for optimal solutions around a particular starting point

e.g., iterative improvement or simulated annealing techniques
Trades optimization time for execution time
I

DDBS12, SL05

Does not guarantee that the best solution is obtained, but avoid the
high cost of optimization

The strategy is better when more than 5-6 relations are involved

6/52

M. Bohlen

Distributed Cost Model/1

Two different types of cost functions can be used

Reduce total time

Reduce response time

I
I

DDBS12, SL05

Reduce each cost component (in terms of time) individually, i.e., do as

little for each cost component as possible
Optimize the utilization of the resources (i.e., increase system
throughput)
Do as many things in parallel as possible
May increase total time because of increased total activity

7/52

M. Bohlen

Distributed Cost Model/2

Total time: Sum of the time of all individual components

I
I

Local processing time: CPU time + I/O time

Communication time: fixed time to initiate a message + time to
transmit the data

Total time =TCPU #instructions + TI/O #I/Os +

TMSG #messages + TTR #bytes

The individual components of the total cost have different weights:

Wide area network

I
I
I

Local area networks

I
I

DDBS12, SL05

Message initiation and transmission costs are high

Local processing cost is low (fast mainframes or minicomputers)
Ratio of communication to I/O costs is 20:1
Communication and local processing costs are more or less equal
Ratio of communication to I/O costs is 1:1.6 (10MB/s network)
8/52

M. Bohlen

Distributed Cost Model/3

Response time: Elapsed time between the initiation and the

completion of a query
Response time =TCPU #seq instructions + TI/O #seq I/Os +
TMSG #seq messages + TTR #seq bytes

where #seq x (x in instructions, I/O, messages, bytes) is the

maximum number of x which must be done sequentially.

Any processing and communication done in parallel is ignored

DDBS12, SL05

9/52

M. Bohlen

Distributed Cost Model/4

Example: Query at site 3 with data from sites 1 and 2.

I
I
I

DDBS12, SL05

Assume that only the communication cost is considered

Total time = TMSG 2 + TTR (x + y )
Response time = max{TMSG + TTR x , TMSG + TTR y }

10/52

M. Bohlen

Database Statistics/1

The primary cost factor is the size of intermediate relations

I
I

that are produced during the execution and

must be transmitted over the network, if a subsequent operation is
located on a different site

It is costly to compute the size of the intermediate relations precisely.

Instead global statistics of relations and fragments are

computed and used to provide approximations

DDBS12, SL05

11/52

M. Bohlen

Database Statistics/2

I
I

Let R (A1 , A2 , . . . , Ak ) be a relation fragmented into R1 , R2 , . . . , Rr .

Relation statistics
I min and max values of each attribute: min{A }, max{A }.
i
i
I length of each attribute: length (A )
i
I number of distinct values in each domain: card (dom (A ))
i
Fragment statistics
I cardinality of the fragment: card (R )
i
I cardinality of each attribute of each fragment: card ( (R )), card (A )
Ai
j
i

DDBS12, SL05

12/52

M. Bohlen

Database Statistics/3
I

Selectivity factor of an operation: the proportion of tuples of an

operand relation that participate in the result of that operation

Assumption: independent attributes and uniform distribution of

attribute values

Selectivity factor of selection

SF (A = value ) =

card (A (R ))
max(A ) value
SF (A > value ) =
max(A ) min(A )
value min(A )
SF (A < value ) =
max(A ) min(A )

DDBS12, SL05

13/52

M. Bohlen

Database Statistics/4

Properties of the selectivity factor of the selection

SF (p (Ai ) p (Aj )) = SF (p (Ai )) SF (p (Aj ))
SF (p (Ai ) p (Aj )) = SF (p (Ai )) + SF (p (Aj ))

(SF (p (Ai )) SF (p (Aj ))

SF (A {values }) = SF (A = value ) card ({values })

DDBS12, SL05

14/52

M. Bohlen

Database Statistics/5
I

Cardinality of intermediate results

Selection
card (P (R )) = SF (P ) card (R )

Projection
I
I

More difficult: correlations between projected attributes are unknown

Simple if the projected attribute is a key

card (A (R )) = card (R )
I

Cartesian Product
card (R S ) = card (R ) card (S )

Union
I
I

Set Difference
I
I

DDBS12, SL05

upper bound: card (R S ) card (R ) + card (S )

lower bound: card (R S ) max{card (R ), card (S )}
upper bound: card (R S ) = card (R )
lower bound: 0
15/52

M. Bohlen

Database Statistics/6
I

Selectivity factor for joins

SFZ =

card (R Z S )
card (R ) card (S )

Cardinality of joins
I

Upper bound: cardinality of Cartesian Product

card (R Z S ) card (R ) card (S )

General case (if SF is given):

card (R Z S ) = SFZ card (R ) card (S )

Special case: R .A is a key of R and S .A is a foreign key of S;

each S-tuple matches with at most one tuple of R

card (R ZR .A =S .A S ) = card (S )

DDBS12, SL05

16/52

M. Bohlen

Database Statistics/7

Selectivity factor for semijoins: fraction of R-tuples that join with

S-tuples
I

An approximation is the selectivity of A in S

SFB< (R B<A S ) = SFB< (S .A ) =

card (A (S ))
card (dom[A ])

Cardinality of semijoin (general case):

card (R B<A S ) = SFB< (S .A ) card (R )

Example: R .A is a foreign key in S (S .A is a primary key)

Then SF = 1 and the result size corresponds to the size of R

DDBS12, SL05

17/52

M. Bohlen

Join Ordering in Fragment Queries/1

Join ordering is an important aspect in centralized DBMS, and it is

even more important in a DDBMS since joins between fragments
that are stored at different sites may increase the communication
time.
Two approaches exist:
I

Optimize the ordering of joins directly

I
I

Replace joins by combinations of semijoins in order to minimize the

communication costs
I

DDBS12, SL05

INGRES and distributed INGRES

System R and System R

Hill Climbing and SDD-1

18/52

M. Bohlen

Join Ordering in Fragment Queries/2

Direct join odering of two relation/fragments located at different

sites
I
I

DDBS12, SL05

Move the smaller relation to the other site

We have to estimate the size of R and S

19/52

M. Bohlen

Join Ordering in Fragment Queries/3

Direct join ordering of queries involving more than two relations is

substantially more complex

Example: Consider the following query and the respective join

graph, where we make also assumptions about the locations of the
three relations/fragments
PROJ ZPNO ASG ZENO EMP

DDBS12, SL05

20/52

M. Bohlen

Join Ordering in Fragment Queries/4

Example (contd.): The query can be evaluated in at least 5

different ways.
I

Plan 1:
EMPSite 2
Site 2: EMP=EMPZASG
EMPSite 3
Site 3: EMPZPROJ

Plan 2:
ASGSite 1
Site 1: EMP=EMPZASG
EMPSite 3
Site 3: EMPZPROJ

Plan 4:
PROJSite 2
Site 2: PROJ=PROJZASG
PROJSite 1
Site 1: PROJZEMP

Plan 3:
ASGSite 3
Site 3: ASG=ASGZPROJ
ASGSite 1
Site 1: ASGZEMP

Plan 5:
EMPSite 2
PROJSite 2
Site 2: EMPZPROJZASG

DDBS12, SL05

21/52

M. Bohlen

Join Ordering in Fragment Queries/5

To select a plan, a lot of information is needed, including

I size (EMP ), size (ASG ), size (PROJ )
I size (EMP Z ASG ), size (ASG Z PROJ )
I

DDBS12, SL05

Possibilities of parallel execution if response time is used

22/52

M. Bohlen

Semijoin Based Algorithms/1

Semijoins can be used to efficiently implement joins

The semijoin acts as a size reducer (similar as to a selection) such

that smaller relations need to be transferred

Consider two relations: R located at site 1 and S located and site 2

Solution with semijoins: Replace one or both operand

relations/fragments by a semijoin, using the following rules:
R ZA S (R B<A S ) ZA S

R ZA (S B<A R )
(R B<A S ) ZA (S B<A R )
I

The semijoin is beneficial if the cost to produce and send it to the

other site is less than the cost of sending the whole operand relation
and doing the actual join.

DDBS12, SL05

23/52

M. Bohlen

Semijoin Based Algorithms/2

sl06.2

Cost analysis R ZA S vs. (R B<A S ) Z S, assuming that

size (R ) < size (S )
I

Perform the join R Z S:

I
I

Perform the semijoins (R B< S ) Z S:

I
I
I
I
I

R Site 2
Site 2 computes R Z S
S 0 = A (S )
S 0 Site 1
Site 1 computes R 0 = R B< S 0
R 0 Site 2
Site 2 computes R 0 Z S

Semijoin is better if: size (A (S )) + size (R B< S ) < size (R )

The semijoin approach is better if the semijoin acts as a sufficient

reducer (i.e., a few tuples of R participate in the join)

The join approach is better if almost all tuples of R participate in

the join

DDBS12, SL05

24/52

M. Bohlen

INGRES Algorithm/1

INGRES uses a dynamic query optimization algorithm that

recursively breaks a query into smaller pieces. It is based on the
following ideas:
I

An n-relation query q is decomposed into n subqueries

q1 q2 qn
I
I

For the decomposition two basic techniques are used: detachment

and substitution
There is a processor that can efficiently process mono-relation
queries
I

DDBS12, SL05

Each qi is a mono-relation (mono-variable) query

The output of qi is consumed by qi +1

Optimizes each query independently for the access to a single relation

25/52

M. Bohlen

INGRES Algorithm/2
I

Detachment: Break a query q into q0 q00 , based on a common

relation that is the result of q0 , i.e.
I

The query
q: SELECT
FROM
WHERE
AND

is decomposed by detachment of the common relation R1 into

q0 :
SELECT R1 .A1
INTO
R10
FROM
R1
WHERE P1 (R1 .A10 )
q00 :

R2 .A2 , . . . , Rn .An
R1 , R2 , . . . , Rn
P1 (R1 .A10 )
P2 (R1 .A1 , . . . , Rn .An )

SELECT
FROM
WHERE

R2 .A2 , . . . , Rn .An
R10 , R2 , . . . , Rn
P2 (R10 .A1 , . . . , Rn .An )

Detachment reduces the size of the relation on which the query q00
is defined.

DDBS12, SL05

26/52

M. Bohlen

INGRES Algorithm/3
I

Example: Consider query q1: Names of employees working on the

CAD/CAM project
q1 : SELECT EMP.ENAME
FROM
EMP, ASG, PROJ
WHERE EMP.ENO = ASG.ENO
AND
ASG.PNO = PROJ.PNO
AND
PROJ.PNAME = CAD/CAM

Decompose q1 into q11 q0 :

q11 : SELECT PROJ.PNO
INTO
JVAR
FROM
PROJ
WHERE PROJ.PNAME = CAD/CAM
q0 :

DDBS12, SL05

SELECT
FROM
WHERE
AND

EMP.ENAME
EMP, ASG, JVAR
EMP.ENO = ASG.ENO
ASG.PNO = JVAR.PNO

27/52

M. Bohlen

INGRES Algorithm/4
I

I
I
I

Example (contd.): The successive detachments may transform q0

into q12 q13 :
q0 :
SELECT EMP.ENAME
FROM
EMP, ASG, JVAR
WHERE EMP.ENO = ASG.ENO
AND
ASG.PNO = JVAR.PNO
q12 :

SELECT
INTO
FROM
WHERE

ASG.ENO
GVAR
ASG, JVAR
ASG.PNO=JVAR.PNO

q13 :

SELECT
FROM
WHERE

EMP.ENAME
EMP, GVAR
EMP.ENO=GVAR.ENO

q1 is now decomposed by detachment into q11 q12 q13

q11 is a mono-relation query
q12 and q13 are multi-relation queries, which cannot be further
detached; also called irreducible

DDBS12, SL05

28/52

M. Bohlen

INGRES Algorithm/5
I

Tuple substitution allows to convert an irreducible query q into

mono-relation queries.
I
I

Choose a relation R1 in q for tuple substitution

For each tuple in R1 , replace the R1 -attributes referred in q by their
actual values, thereby generating a set of subqueries q0 with n 1
relations, i.e.,
q(R1 , R2 , . . . , Rn ) is replaced by {q0 (t1i , R2 , . . . , Rn ), t1i R1 }

Example (contd.): Assume GVAR consists only of the tuples

{E1, E2}. Then q13 is rewritten with tuple substitution in the following
way
q13 : SELECT EMP.ENAME
FROM
EMP, GVAR
WHERE EMP.ENO = GVAR.ENO
q131 :

DDBS12, SL05

SELECT
FROM
WHERE

EMP.ENAME
EMP
EMP.ENO = E1
29/52

M. Bohlen

INGRES Algorithm/6

Example (contd.):
q132 :

DDBS12, SL05

SELECT
FROM
WHERE

EMP.ENAME
EMP
EMP.ENO = E2

q131 and q132 are mono-relation queries

30/52

M. Bohlen

Distributed INGRES Algorithm

sl06.1

The distributed INGRES query optimization algorithm is very

similar to the centralized INGRES algorithm.
I

DDBS12, SL05

In addition to the centralized INGRES, the distributed one should

break up each query qi into sub-queries that operate on fragments;
only horizontal fragmentation is handled.
Optimization with respect to a combination of communication cost
and response time

31/52

M. Bohlen

System R Algorithm/1
I

The System R (centralized) query optimization algorithm

Performs static query optimization based on exhaustive search of

the solution space and a cost function (IO cost + CPU cost)
I
I
I

Input: relational algebra tree

Output: optimal relational algebra tree
Dynamic programming technique is applied to reduce the number of
alternative plans

The optimization algorithm consists of two steps

1. Predict the best access method to each individual relation
(mono-relation query)
2. Consider using index, file scan, etc.
3. For each relation R, estimate the best join ordering
4. R is first accessed using its best single-relation access method
5. Efficient access to inner relation is crucial

Considers two different join strategies

I
I

DDBS12, SL05

(Indexed-) nested loop join

Sort-merge join

32/52

M. Bohlen

System R Algorithm/2
I

Example: Consider query q1: Names of employees working on the

CAD/CAM project
PROJ ZPNO ASG ZENO EMP
I

Join graph

Indexes
I
I
I

DDBS12, SL05

EMP has an index on ENO

ASG has an index on PNO
PROJ has an index on PNO and an index on PNAME

33/52

M. Bohlen

System R Algorithm/3

Example (contd.): Step 1 Select the best single-relation access

paths
I
I
I

DDBS12, SL05

EMP: sequential scan (because there is no selection on EMP)

ASG: sequential scan (because there is no selection on ASG)
PROJ: index on PNAME (because there is a selection on PROJ
based on PNAME)

34/52

M. Bohlen

System R Algorithm/4
I

sl06.4

Example (contd.): Step 2 Select the best join ordering for each
relation

I
I

(EMP PROJ) and (PROJ EMP) are pruned because they are CPs
(ASG Z PROJ) pruned because (we assume) it has higher cost than
(PROJ Z ASG); similar for (ASG Z EMP)
Best total join order ((PROJZ ASG)Z EMP), since it uses the indexes
best
I
I
I

DDBS12, SL05

Select PROJ using index on PNAME

Join with ASG using index on PNO
Join with EMP using index on ENO
35/52

M. Bohlen

Distributed System R Algorithm/1

The System R query optimization algorithm is an extension of

the System R query optimization algorithm with the following main
characteristics:
I

Only the whole relations can be distributed, i.e., fragmentation and

replication is not considered
Query compilation is a distributed task, coordinated by a master site,
where the query is initiated
Master site makes all inter-site decisions, e.g., selection of the
execution sites, join ordering, method of data transfer, ...
The local sites do the intra-site (local) optimizations, e.g., local joins,
access paths

Join ordering and data transfer between different sites are the most
critical issues to be considered by the master site

DDBS12, SL05

36/52

M. Bohlen

Distributed System R Algorithm/2

Two methods for inter-site data transfer

Ship whole: The entire relation is shipped to the join site and stored
in a temporary relation
I
I
I

Fetch as needed: The outer relation is sequentially scanned, and for

each tuple the join value is sent to the site of the inner relation and
the matching inner tuples are sent back (i.e., semijoin)
I
I
I

DDBS12, SL05

Larger data transfer

Smaller number of messages
Better if relations are small

Number of messages = O(cardinality of outer relation)

Data transfer per message is minimal
Better if relations are large and the selectivity is good

37/52

M. Bohlen

Distributed System R Algorithm/3

Four main join strategies for R Z S:

I
I

Notation:
I
I
I

R is outer relation
S is inner relation
LT denotes local processing time
CT denotes communication time
s denotes the average number of S-tuples that match an R-tuple

Strategy 1: Ship the entire outer relation to the site of the inner
relation, i.e.,
I
I
I

Retrieve outer tuples

Send them to the inner relation site
Join them as they arrive

Total cost = LT (retrieve card (R ) tuples from R ) +

CT (size (R )) +
LT (retrieve s tuples from S ) card (R )
DDBS12, SL05

38/52

M. Bohlen

Distributed System R Algorithm/4

Strategy 2: Ship the entire inner relation to the site of the outer
relation. We cannot join as they arrive; they need to be stored.
I

The inner relation S need to be stored in a temporary relation

Total cost = LT (retrieve card (S ) tuples from S ) +

CT (size (S )) +
LT (store card (S ) tuples in T ) +
LT (retrieve card (R ) tuples from R ) +
LT (retrieve s tuples from T ) card (R )

DDBS12, SL05

39/52

M. Bohlen

Distributed System R Algorithm/5

Strategy 3: Fetch tuples of the inner relation as needed for each

tuple of the outer relation.
I
I

For each R-tuple, the join attribute A is sent to the site of S

The s matching S-tuples are retrieved and sent to the site of R

Total cost = LT (retrieve card (R ) tuples from R ) +

CT (length (A )) card (R ) +
LT (retrieve s tuples from S ) card (R ) +
CT (s length (S )) card (R )

DDBS12, SL05

40/52

M. Bohlen

sl06.6
sl06.7

Distributed System R Algorithm/6

Strategy 4: Move both relations to a third site and compute the join
there.
I

The inner relation S is first moved to a third site and stored in a

temporary relation.
Then the outer relation is moved to the third site and its tuples are
joined as they arrive.

Total cost = LT (retrieve card (S ) tuples from S ) +

CT (size (S )) +
LT (store card (S ) tuples in T ) +
LT (retrieve card (R ) tuples from R ) +
CT (size (R )) +
LT (retrieve s tuples from T ) card (R )

DDBS12, SL05

41/52

M. Bohlen

Hill-Climbing Algorithm/1

Hill-Climbing query optimization algorithm

I
I
I

DDBS12, SL05

Refinements of an initial feasible solution are recursively computed

until no more cost improvements can be made
Semijoins, data replication, and fragmentation are not used
Devised for wide area point-to-point networks
The first distributed query processing algorithm

42/52

M. Bohlen

Hill-Climbing Algorithm/2
I

The hill-climbing algorithm proceeds as follows

1. Select initial feasible execution strategy ES0
I

i.e., a global execution schedule that includes all intersite

communication
Determine the candidate result sites, where a relation referenced in the
query exist
Compute the cost of transferring all the other referenced relations to
each candidate site
ES0 = candidate site with minimum cost

2. Split ES0 into two strategies: ES1 followed by ES2

ES1: send one of the relations involved in the join to the other relations
site
ES2: send the join result to the final result site

3. Replace ES0 with the split schedule which gives

cost (ES1) + cost (local join) + cost (ES2) < cost (ES0)
4. Recursively apply steps 2 and 3 on ES1 and ES2 until no more
benefit can be gained
5. Check for redundant transmissions in the final plan and eliminate
them
DDBS12, SL05

43/52

M. Bohlen

Hill-Climbing Algorithm/3
I

Example: What are the salaries of engineers who work on the

CAD/CAM project?
SAL (PAY ZTITLE EMP ZENO (ASG ZPNO (PNAME =CAD /CAM 00 (PROJ ))))
I

Schemas: EMP(ENO, ENAME, TITLE), ASG(ENO, PNO, RESP,

DUR), PROJ(PNO, PNAME, BUDGET, LOC), PAY(TITLE, SAL)
Statistics
Relation Size Site
EMP
8
1
PAY
4
2
PROJ
1
3
ASG
10
4
Assumptions:
I
I
I
I
I

DDBS12, SL05

Size of relations is defined as their cardinality

Minimize total cost
Transmission cost between two sites is 1
Ignore local processing cost
size(EMP Z PAY) = 8, size(PROJ Z ASG) = 2, size(ASG Z EMP) = 10
44/52

M. Bohlen

Hill-Climbing Algorithm/4
I

Example (contd.): Determine initial feasible execution strategy

Alternative 1: Resulting site is site 1

Total cost = cost (PAY Site1) + cost (ASG Site1) +
cost (PROJ Site1)
= 4 + 10 + 1 = 15

Alternative 2: Resulting site is site 2

Total cost = 8 + 10 + 1 = 19

Alternative 3: Resulting site is site 3

Total cost = 8 + 4 + 10 = 22

Alternative 4: Resulting site is site 4

Total cost = 8 + 4 + 1 = 13

I
DDBS12, SL05

Therefore ES0 = EMPSite4; PAY Site4; PROJ Site4

45/52

M. Bohlen

Hill-Climbing Algorithm/5
I

Example (contd.): Candidate split

Alternative 1: ES1,
ES2, ES3
I
I

cost ((EMP Z PAY) Site4) +

ES1: EMPSite 2
ES2: (EMPZPAY)
Site4
ES3: PROJSite 4

Alternative 2: ES1,
ES2, ES3
I

Total cost = cost (EMP Site2) +

cost (PROJ Site4)

= 8 + 8 + 1 = 17

Total cost = cost (PAYSite 1) +

ES1: PAY Site1

ES2: (PAY Z
EMP) Site4
ES3: PROJ
Site 4

cost ((PAY Z EMP) Site4) +

cost (PROJ Site4)

= 4 + 8 + 1 = 13

Both alternatives are not better than ES0, so keep ES0 (or take
alternative 2 which has the same cost)

DDBS12, SL05

46/52

M. Bohlen

Hill-Climbing Algorithm/6

Problems
I

I
I

sl06.5

Greedy algorithm determines an initial feasible solution and iteratively

improves it
If there are local minima, it may not find the global minimum
An optimal schedule with a high initial cost would not be found, since
it wont be chosen as the initial feasible solution

Example: A better schedule is

I PROJSite 4
I ASG = (PROJZASG)Site 1
I (ASGZEMP)Site 2
I Total cost= 1 + 2 + 2 = 5

DDBS12, SL05

47/52

M. Bohlen

SDD-1
I

The SDD-1 algorithm extends the hill climbing algorithm with

semijoins and has the following properties:
I

Considers semijoins
I
I

I
I

cost (R |>< A S ) = CMSG + size (A (S )) CTR

benefit (R |>< A S ) = (1 SF |>< (S .A )) size (R ) CTR

Does not consider replication and fragmentation

Cost of transferring the result to the user site from the final result site
is not considered
Can minimize either total time or response time

The SDD-1 algorithm works with and updates a database profile:

R
R1
R2
R3

DDBS12, SL05

size (R )
1500
3000
2000

A
R1.A
R2.A
R2.B
R3.B

SF |><
0.3
0.8
1.0
0.4

48/52

size (A )
36
320
400
80

M. Bohlen

SDD-1 Algorithm
Step 1 Include all local processing in the execution strategy ES.
Step 2 Update database profile with effects of local processing.
Step 3 Determine beneficial

|><

, i.e., cost ( |>< i ) < benefit ( |>< i ).

Step 4 Remove the most beneficial

|><

and append it to ES.

Step 5 Update the database profile.

Step 6 Update the set of beneficial semijoins; possibly include new
ones.
Step 7 If there are beneficial semijoins go back to Step 4.
Step 8 Find the site where the largest amount of data resides and
select it as the result site.
Step 9 For each Ri at the result site, remove semijoins of the form
Ri |>< Rj where the total cost of ES without this semijoin is
smaller than the cost with it.
Step 10 Permute the order of semijoins if doing so would improve
the total cost of ES.
DDBS12, SL05

49/52

M. Bohlen

Conclusion
I

Distributed query optimization is more complex that centralized

query processing, since
I
I

bushy query trees are not necessarily a bad choice

one needs to decide what, where, and how to ship the relations
between the sites

Query optimization searches the optimal query plan (tree)

For N relations, there are O (N !) equivalent join trees. To cope with

the complexity heuristics and/or restricted types of trees are
considered.

There are two main strategies in query optimization: randomized

and deterministic.

Semi-joins can be used to implement a join. The semi-joins require

more operations to perform, but the data transfer rate is reduced.

INGRES, System R and Hill Climbing are distributed query

optimization algorithms.

DDBS12, SL05

50/52

M. Bohlen

Course Project

I
I

Hand in of project: December 23, 2012

Report
I
I
I
I
I

problem definition
running example
description of solution
evaluation
strength, weaknesses, limitations

Report (5 pages) and implementation (source code, data, steps to

install and run) as zip/tar file

Send by email to boehlen@ifi.uzh.ch and cafagna@ifi.uzh.ch

DDBS12, SL05

51/52

M. Bohlen

Course Exam

Exam date: 16.01.2013

Exam time: 12:15 - 12:45

Exam location: BIN 2.E.13

Exam form and procedure

I
I
I

oral, 20 minutes
10 minutes about project (demo, code, algorithm)
10 about a topic of the course

During exam: present solutions on examples

Prepare suitable examples beforehand

DDBS12, SL05

52/52

M. Bohlen

HKMO Solution Final
0% (3)
HKMO Solution Final
248 pages
Distibuted Database Management System Notes
No ratings yet
Distibuted Database Management System Notes
58 pages
Fybca Dbms Slip
50% (4)
Fybca Dbms Slip
37 pages
Computer Network UNIT 3
No ratings yet
Computer Network UNIT 3
28 pages
Ui Ux MCQ
No ratings yet
Ui Ux MCQ
18 pages
Distributed Database Design Concept
No ratings yet
Distributed Database Design Concept
5 pages
IMP Questions ADA
No ratings yet
IMP Questions ADA
7 pages
SIC ANswers With Question
No ratings yet
SIC ANswers With Question
15 pages
Question Bank ASQL
No ratings yet
Question Bank ASQL
2 pages
Assignment 3 NPTEL DBMS January 2024
No ratings yet
Assignment 3 NPTEL DBMS January 2024
10 pages
DC Question Bank 5 Units
No ratings yet
DC Question Bank 5 Units
17 pages
Assignment 2 Mod 3 - Solution
100% (1)
Assignment 2 Mod 3 - Solution
11 pages
Distributed Computing Question Paper
No ratings yet
Distributed Computing Question Paper
2 pages
Dbms-Unit-3 - Aktu
100% (1)
Dbms-Unit-3 - Aktu
7 pages
MCA Syllabus BPUT
No ratings yet
MCA Syllabus BPUT
11 pages
DBMS Lab Questions
50% (2)
DBMS Lab Questions
5 pages
7.3. Objectives of Distributed Transaction Management
No ratings yet
7.3. Objectives of Distributed Transaction Management
2 pages
MCQ Questions For BCA 6 Semester
No ratings yet
MCQ Questions For BCA 6 Semester
11 pages
QUESTIONS ON Second Normal Form
No ratings yet
QUESTIONS ON Second Normal Form
8 pages
Chapter 6: Query Decomposition and Data Localization
0% (1)
Chapter 6: Query Decomposition and Data Localization
26 pages
Labmanual Compiler Design
100% (3)
Labmanual Compiler Design
65 pages
Cyber Space, Cybersquatting, Cyber Punk, Cyber Warfare, Cyber Terrorism
No ratings yet
Cyber Space, Cybersquatting, Cyber Punk, Cyber Warfare, Cyber Terrorism
12 pages
Routing and Switching Important MCQs
No ratings yet
Routing and Switching Important MCQs
9 pages
Part A 1. Determine The GCD (24140,16762) Using Euclid's Algorithm. (A/M-2017)
No ratings yet
Part A 1. Determine The GCD (24140,16762) Using Euclid's Algorithm. (A/M-2017)
36 pages
Algo PPT Unit-2 B Tree
No ratings yet
Algo PPT Unit-2 B Tree
38 pages
Unit-1 Problem Areas in A Distributed DDBMS
100% (3)
Unit-1 Problem Areas in A Distributed DDBMS
8 pages
File Indexing Structures MCQ
No ratings yet
File Indexing Structures MCQ
3 pages
AWP MCQ TEst 1 2021-22
0% (1)
AWP MCQ TEst 1 2021-22
7 pages
Distributed File Systems: Unit - V Essay Questions
No ratings yet
Distributed File Systems: Unit - V Essay Questions
10 pages
DBMS Lab Question
No ratings yet
DBMS Lab Question
4 pages
Sqa MCQ
No ratings yet
Sqa MCQ
14 pages
DBMS Bca Question Bank
No ratings yet
DBMS Bca Question Bank
34 pages
Computer Networks and Security: Model Question Paper-1 With Effect From 2019-20 (CBCS Scheme)
No ratings yet
Computer Networks and Security: Model Question Paper-1 With Effect From 2019-20 (CBCS Scheme)
3 pages
Information Visualization Technologies
No ratings yet
Information Visualization Technologies
15 pages
Dbms 100 MCQ PDF
No ratings yet
Dbms 100 MCQ PDF
14 pages
Rdbms Lab Questions
No ratings yet
Rdbms Lab Questions
3 pages
DBMS
No ratings yet
DBMS
8 pages
Systems Programming Question Paper-1
100% (2)
Systems Programming Question Paper-1
1 page
DBMS All Five Units MCQS
100% (1)
DBMS All Five Units MCQS
14 pages
Java Technologies MCQ'S
No ratings yet
Java Technologies MCQ'S
13 pages
Experiment-1 Aim: Write A Program For Implementation of Bit Stuffing
No ratings yet
Experiment-1 Aim: Write A Program For Implementation of Bit Stuffing
56 pages
RDBMS Unit 5
No ratings yet
RDBMS Unit 5
39 pages
HR Organizer 2023
No ratings yet
HR Organizer 2023
112 pages
Index Sequential Search
100% (1)
Index Sequential Search
3 pages
Eucalyptus Nimbus OpenNebula
0% (1)
Eucalyptus Nimbus OpenNebula
18 pages
DWDM Assignment 1
No ratings yet
DWDM Assignment 1
4 pages
Network Design Issues
No ratings yet
Network Design Issues
13 pages
Concurrency Control DBMS
No ratings yet
Concurrency Control DBMS
12 pages
COMSATS University Islamabad, Lahore Campus: Department of Computer Sciences
No ratings yet
COMSATS University Islamabad, Lahore Campus: Department of Computer Sciences
2 pages
Primitives For Distributed Communication
100% (2)
Primitives For Distributed Communication
10 pages
20 Distributed Reliability Protocols PDF
0% (2)
20 Distributed Reliability Protocols PDF
31 pages
Web Lab Question Bank
100% (1)
Web Lab Question Bank
2 pages
Computer Graphics 2marks
67% (3)
Computer Graphics 2marks
12 pages
Mad Lab Viva Questions
0% (1)
Mad Lab Viva Questions
3 pages
Statistical Software Quality Assurance
No ratings yet
Statistical Software Quality Assurance
3 pages
CSE 453 Slide 3
No ratings yet
CSE 453 Slide 3
72 pages
vu_Lec_33
No ratings yet
vu_Lec_33
36 pages
vu_Lec_35
No ratings yet
vu_Lec_35
42 pages
1.6 PPT - Query Optimization
No ratings yet
1.6 PPT - Query Optimization
53 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
4-Query Processing Nhom1
No ratings yet
4-Query Processing Nhom1
73 pages
Variable Neighborhood Search For Bin Packing Problem
No ratings yet
Variable Neighborhood Search For Bin Packing Problem
21 pages
LW 1119 Amath250notes
No ratings yet
LW 1119 Amath250notes
8 pages
Finite Element Analysis For Stresses in Thin Walled Pressurized Steel Cylinders
No ratings yet
Finite Element Analysis For Stresses in Thin Walled Pressurized Steel Cylinders
5 pages
Class - X - Mathematics - Final Term Examination - QP - Set - A - 2018-19
No ratings yet
Class - X - Mathematics - Final Term Examination - QP - Set - A - 2018-19
4 pages
Asco Progress Academy P
No ratings yet
Asco Progress Academy P
2 pages
Ableton Live API
No ratings yet
Ableton Live API
51 pages
Statics: Vector Mechanics For Engineers
No ratings yet
Statics: Vector Mechanics For Engineers
8 pages
The Binomial, Poisson, and Normal Distributions: Modified After Powerpoint by Fauziah Binti Aziz
No ratings yet
The Binomial, Poisson, and Normal Distributions: Modified After Powerpoint by Fauziah Binti Aziz
25 pages
IB Chemistry - How To Write A Lab Report: General
No ratings yet
IB Chemistry - How To Write A Lab Report: General
3 pages
DLL - Mathematics 6 - Q3 - W2
No ratings yet
DLL - Mathematics 6 - Q3 - W2
10 pages
PHD Bart Baesens PDF
No ratings yet
PHD Bart Baesens PDF
264 pages
System Software Notes 5TH Sem Vtu
50% (2)
System Software Notes 5TH Sem Vtu
25 pages
L2-Earthquake Ground Motion and Response Spectra-Kelompok 5
No ratings yet
L2-Earthquake Ground Motion and Response Spectra-Kelompok 5
125 pages
Continuum Mechanics PDF
No ratings yet
Continuum Mechanics PDF
10 pages
Nikhil 2000290100092 WTLAB2
No ratings yet
Nikhil 2000290100092 WTLAB2
5 pages
Extra Material For Grade 9 Mathematics
No ratings yet
Extra Material For Grade 9 Mathematics
9 pages
OOPs Unit-3
No ratings yet
OOPs Unit-3
83 pages
H. Circle Theorems
No ratings yet
H. Circle Theorems
17 pages
Dynamical Friction in Modified Newtonian Dynamics
No ratings yet
Dynamical Friction in Modified Newtonian Dynamics
6 pages
Financial Derivatives Call & Put Option Group Members: Assignment On
No ratings yet
Financial Derivatives Call & Put Option Group Members: Assignment On
9 pages
M700V, M70V - Programming Manual (Lathe System) IB (NA) - 1500924-G (03.14)
No ratings yet
M700V, M70V - Programming Manual (Lathe System) IB (NA) - 1500924-G (03.14)
794 pages
Fear of Math - How To Get Over It And' Get On With Your Life
100% (1)
Fear of Math - How To Get Over It And' Get On With Your Life
277 pages
Psychological Statistics
No ratings yet
Psychological Statistics
36 pages
Deloitte Papers
0% (1)
Deloitte Papers
9 pages
Continuum Mechanics - Wikipedia
No ratings yet
Continuum Mechanics - Wikipedia
2 pages
Epei
No ratings yet
Epei
25 pages
General Column Design by PROKON
No ratings yet
General Column Design by PROKON
7 pages
Applying To Portuguese Universities - Information 2023
No ratings yet
Applying To Portuguese Universities - Information 2023
23 pages
Dirac Delta Function (Contd.) : Lecture #5
No ratings yet
Dirac Delta Function (Contd.) : Lecture #5
14 pages