0% found this document useful (0 votes)

25 views

Lecture5 -Query_Processing 1

Query Processing

Uploaded by

amirosama2121

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views

Lecture5 -Query_Processing 1

Query Processing

Uploaded by

amirosama2121

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Principles of Distributed Database

Systems
M. Tamer Özsu
Patrick Valduriez

© 2020, M.T. Özsu & P. Valduriez 1

Outline
◼ Introduction
◼ Distributed and parallel database design
◼ Distributed data control
◼ Distributed Query Processing
◼ Distributed Transaction Processing
◼ Data Replication
◼ Database Integration – Multidatabase Systems
◼ Parallel Database Systems
◼ Peer-to-Peer Data Management
◼ Big Data Processing
◼ NoSQL, NewSQL and Polystores
◼ Web Data Management
© 2020, M.T. Özsu & P. Valduriez 2
Outline
◼ Distributed Query Processing
❑ Query Decomposition and Localization
❑ Join Ordering
❑ Distributed Query Optimization
❑ Adaptive Query Processing

© 2020, M.T. Özsu & P. Valduriez 3

Query Processing in a DDBMS

◼ Generally, a query in distributed DBMS require data from

multiple sites, and this is called transmission of data that
causes communication costs.

◼ Query processing in DBMS is different from centralized

DBMS due to the communication cost of data transfer
over the network.

◼ The transmission cost is low when the sites are

connected through high-speed network and is quite
significant in another network.

© 2020, M.T. Özsu & P. Valduriez 4

Query Processing in a DDBMS

◼ In distributed query processing, the data transfer

cost of distributed query processing means:

❑ Cost of transferring intermediate files to other sites for

processing and
❑ Cost of transferring the ultimate result file to the
location where the results required

© 2020, M.T. Özsu & P. Valduriez 5

Distributed DBMS Environment

• If s1 request a query and

needs data from s2 and
s3.
• It is decided to execute
the query at s3.
• 1st communication cost is
transferring the data from
s2 to s3 → then s3 will
execute the query and
get the result.
• 2nd communication cost is
transferring the result
from s3 to s1

© 2020, M.T. Özsu & P. Valduriez 6

Query Processing in a DDBMS

High level user query

Query
Processor

Low-level data manipulation

commands for D-DBMS

© 2020, M.T. Özsu & P. Valduriez 7

Query Processing Components

◼ Query language
❑ SQL: “intergalactic dataspeak”

◼ Query execution
❑ The steps that one goes through in executing high-level
(declarative) user queries.

◼ Query optimization
❑ How do we determine the “best” execution plan?

◼ We assume a homogeneous D-DBMS

© 2020, M.T. Özsu & P. Valduriez 8

Selecting Alternatives
Find the names of employee who are
managing a project

Strategy 1

SELECT ENAME
FROM EMP , ASG
WHERE emp.no=asg.no and RESP = "Manager"

Strategy 1
 ENAME(RESP=“Manager”EMP.ENO=ASG.ENO(EMP×ASG))

© 2020, M.T. Özsu & P. Valduriez 9

Selecting Alternatives
Strategy 2

SELECT ENAME
FROM EMP NATURAL JOIN ASG
WHERE RESP = "Manager"

Strategy 2
 ENAME(EMP ⋈ENO (RESP=“Manager” (ASG))
Strategy 2 avoids Cartesian product, and consumes less
computing resources, so may be “better”

© 2020, M.T. Özsu & P. Valduriez 10

Selecting Alternatives
In a distributed system,

◼ Relational algebra is not enough to express execution

strategies. It must be supplemented with operators for
exchanging data between sites.

◼ The distributed query processor must also select the

best sites to process data , and possibly the way data
should be transformed.

© 2020, M.T. Özsu & P. Valduriez 11

What is the Problem?

Site 1 Site 2 Site 3 Site 4 Site 5

ASG1=ENO≤“E3”(ASG) ASG2= ENO>“E3”(ASG) EMP1= ENO≤“E3”(EMP) EMP2= ENO>“E3”(EMP) Result

Strategy A Strategy B
© 2020, M.T. Özsu & P. Valduriez 12
Cost of Alternatives

Assume
◼ size(EMP) = 400 row,
size(ASG) = 1000
◼ tuple access cost = 1
unit (1 operation or 1s);
◼ tuple transfer cost = 10
units
◼ There are 20 managers
in relation ASG
◼ Assume that the data is
uniformly distributed
among sites

© 2020, M.T. Özsu & P. Valduriez 13

Cost of Alternatives

◼ Strategy A
❑ produce ASG': (10+10) tuple access cost 20
❑ transfer ASG' to the sites of EMP: (10+10)
tuple transfer cost 200
❑ produce EMP': (10+10) tuple access cost
2 40
❑ transfer EMP' to result site: (10+10) tuple
transfer cost 200
Total Cost 460

◼ Strategy B
❑ transfer EMP to site 5: 400 tuple transfer
cost 4,000
❑ transfer ASG to site 5: 1000 tuple transfer
cost 10,000
❑ produce ASG': 1000 tuple access (apply
condition) 1,000
❑ join EMP and ASG': 400 20(manager) tuple
access cost 8,000
Total Cost 23,000

© 2020, M.T. Özsu & P. Valduriez 14

Query Optimization Objectives
◼ Minimize a cost function
❑ I/O cost + CPU cost + communication cost
❑ These might have different weights in different distributed
environments
◼ Wide area networks
❑ Communication cost may dominate or vary much
◼ Bandwidth
◼ Speed
◼ Protocol overhead
◼ Local area networks
❑ Communication cost not that dominant, so total cost function
should be considered
◼ Can also maximize throughput
© 2020, M.T. Özsu & P. Valduriez 15
Complexity of Relational Operations

Operation Complexity

Select
Project O(n)
◼ Assume (without duplicate elimination)

❑ Relations of cardinality n Project

(with duplicate elimination) O(n  log n)
❑ Sequential scan
Group

Join
Semi-join O(n  log n)
Division
Set Operators

Cartesian Product O(n2)

Types Of Optimizers

◼ Exhaustive search
❑ Cost-based
❑ Optimal
❑ Combinatorial complexity in the number of relations
◼ Heuristics
❑ Not optimal
❑ Regroup common sub-expressions
❑ Perform selection, projection first
❑ Replace a join by a series of semijoins
❑ Reorder operations to reduce intermediate relation size
❑ Optimize individual operations

Optimization Granularity

◼ Single query at a time

❑ Cannot use common intermediate results

◼ Multiple queries at a time

❑ Efficient if many similar queries

❑ Decision space is much larger

Optimization Timing

◼ Static : optimization is done at query compilation time

❑ Compilation ➔ optimize prior to the execution
❑ Difficult to estimate the size of the intermediate resultserror
propagation
❑ Can amortize over many executions
◼ Dynamic: proceeds at query execution time
❑ Run time optimization
❑ Exact information on the intermediate relation sizes
❑ Have to re-optimize for multiple executions
◼ Hybrid: tradeoff between both
❑ Compile using a static algorithm
❑ If the error in estimate sizes > threshold, re-optimize at run time

Statistics

◼ Relation
❑ Cardinality
❑ Size of a tuple
❑ Fraction of tuples participating in a join with another relation
◼ Attribute
❑ Cardinality of domain
❑ Actual number of distinct values
◼ Simplifying assumptions
❑ Independence between different attribute values
❑ Uniform distribution of attribute values within their domain

Optimization Decision Sites

◼ Centralized
❑ Single site determines the “best” schedule
❑ Simple
❑ Need knowledge about the entire distributed database
◼ Distributed
❑ Cooperation among sites to determine the schedule
❑ Need only local information
❑ Cost of cooperation
◼ Hybrid
❑ One site determines the global schedule
❑ Each site optimizes the local subqueries

Network Topology

◼ Wide area networks (WAN) – point-to-point

❑ Characteristics
◼ Relatively low bandwidth (compared to local CPU/IO)
◼ High protocol overhead
❑ Communication cost may dominate; ignore all other cost factors
❑ Global schedule to minimize communication cost
❑ Local schedules according to centralized query optimization
◼ Local area networks (LAN)
❑ Communication cost not that dominant
❑ Total cost function should be considered
❑ Broadcasting can be exploited (joins)
❑ Special algorithms exist for star networks

Questions?

STEPS in Redemption Process
99% (67)
STEPS in Redemption Process
5 pages
Query Processing
No ratings yet
Query Processing
121 pages
Outline: Distributed Query Processing
No ratings yet
Outline: Distributed Query Processing
8 pages
6 Query Intro
No ratings yet
6 Query Intro
15 pages
6-Query Intro
No ratings yet
6-Query Intro
15 pages
ADB - Unit - III (Chapter-2) - Query Processing and Decomposition
No ratings yet
ADB - Unit - III (Chapter-2) - Query Processing and Decomposition
42 pages
4 Query Processing
No ratings yet
4 Query Processing
79 pages
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
No ratings yet
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
73 pages
Query
No ratings yet
Query
104 pages
Query
No ratings yet
Query
104 pages
4-Query Processing Nhom1
No ratings yet
4-Query Processing Nhom1
73 pages
4-2-Query_Processing
No ratings yet
4-2-Query_Processing
106 pages
Query Optimization
No ratings yet
Query Optimization
29 pages
QueryProcessing Lect 3
No ratings yet
QueryProcessing Lect 3
26 pages
SF8 - UNIT 2 DDB
No ratings yet
SF8 - UNIT 2 DDB
97 pages
L1 Distributed QueryProcessing
No ratings yet
L1 Distributed QueryProcessing
4 pages
vu_Lec_30
No ratings yet
vu_Lec_30
28 pages
Unit II QUERY PROCESSING AND DECOMPOSITION
No ratings yet
Unit II QUERY PROCESSING AND DECOMPOSITION
24 pages
Chapter 5: Overview of Query Processing
No ratings yet
Chapter 5: Overview of Query Processing
18 pages
CSE 453 Slide 3
No ratings yet
CSE 453 Slide 3
72 pages
DDB Lec 4 PDF
No ratings yet
DDB Lec 4 PDF
69 pages
10 DistQueryOptimization
No ratings yet
10 DistQueryOptimization
14 pages
07.overview of Query Processing
No ratings yet
07.overview of Query Processing
35 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
31 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
17 pages
Distributed Query Optimization: Oscar Romero Alberto Abelló Gamazo
No ratings yet
Distributed Query Optimization: Oscar Romero Alberto Abelló Gamazo
44 pages
All Merged
No ratings yet
All Merged
513 pages
Outline: Multidatabase Query Processing
No ratings yet
Outline: Multidatabase Query Processing
41 pages
Ch1 (CSE417)
No ratings yet
Ch1 (CSE417)
46 pages
RBD Lectures Merged
No ratings yet
RBD Lectures Merged
367 pages
8-Parallel Nhom5
No ratings yet
8-Parallel Nhom5
59 pages
3 QueryProcessing
No ratings yet
3 QueryProcessing
15 pages
Outline: Parallel Database Systems
No ratings yet
Outline: Parallel Database Systems
48 pages
Query Optimization
No ratings yet
Query Optimization
27 pages
1 Introduction
No ratings yet
1 Introduction
46 pages
Automated Physical Database Design and Tuning Emerging Directions in Database Systems and Applications 1st Edition Nicolas Bruno pdf download
100% (1)
Automated Physical Database Design and Tuning Emerging Directions in Database Systems and Applications 1st Edition Nicolas Bruno pdf download
61 pages
Overview of Query Processing
No ratings yet
Overview of Query Processing
35 pages
1 Introduction
No ratings yet
1 Introduction
50 pages
Outline: What Is A Distributed DBMS Distributed DBMS Architecture
No ratings yet
Outline: What Is A Distributed DBMS Distributed DBMS Architecture
40 pages
Unit2 1
No ratings yet
Unit2 1
10 pages
Distributed Databases Data Warehousing: CPS 216 Advanced Database Systems
No ratings yet
Distributed Databases Data Warehousing: CPS 216 Advanced Database Systems
11 pages
Begin
No ratings yet
Begin
11 pages
Distributed Databases
No ratings yet
Distributed Databases
32 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
24 pages
DDP ch7
No ratings yet
DDP ch7
1 page
1 Introduction
No ratings yet
1 Introduction
42 pages
vu_Lec_33
No ratings yet
vu_Lec_33
36 pages
Introduction-Distributed DBMS-1-26
No ratings yet
Introduction-Distributed DBMS-1-26
26 pages
Efficient Join On DBMS
No ratings yet
Efficient Join On DBMS
3 pages
Distributed Database Systems-Chhanda Ray
No ratings yet
Distributed Database Systems-Chhanda Ray
271 pages
Outline: Data Server Approach Parallel Architectures Parallel DBMS Techniques Parallel Execution Models
No ratings yet
Outline: Data Server Approach Parallel Architectures Parallel DBMS Techniques Parallel Execution Models
42 pages
Implications of A Distributed Environment Part 2
No ratings yet
Implications of A Distributed Environment Part 2
38 pages
Final DBMS Unit 7
No ratings yet
Final DBMS Unit 7
48 pages
1-Giới Thiệu Cơ Sở Dữ Liệu Phân Tán
No ratings yet
1-Giới Thiệu Cơ Sở Dữ Liệu Phân Tán
51 pages
1 Introduction
No ratings yet
1 Introduction
50 pages
vu_Lec_35
No ratings yet
vu_Lec_35
42 pages
1 Introduction
No ratings yet
1 Introduction
58 pages
chapter 1-Ad (1)
No ratings yet
chapter 1-Ad (1)
48 pages
ADB Notes 2021
No ratings yet
ADB Notes 2021
43 pages
1 Introduction
No ratings yet
1 Introduction
21 pages
What Is The Price Of A Mousetrap? The Assessment Of Value From Cloud Services.
From Everand
What Is The Price Of A Mousetrap? The Assessment Of Value From Cloud Services.
Ernie Zibert
No ratings yet
How To Read P&Ids: Dave Harrold, Senior Editor Control Engineering August 1, 2000
No ratings yet
How To Read P&Ids: Dave Harrold, Senior Editor Control Engineering August 1, 2000
6 pages
Adidas
0% (1)
Adidas
46 pages
XTR101
No ratings yet
XTR101
26 pages
The Secret To Marketing Simulations by Concentric
No ratings yet
The Secret To Marketing Simulations by Concentric
41 pages
Basic Technical Mathematics With Calculus 10th Edition by Washington ISBN Solution Manual
100% (41)
Basic Technical Mathematics With Calculus 10th Edition by Washington ISBN Solution Manual
74 pages
Data Mining-Backpropagation
100% (1)
Data Mining-Backpropagation
5 pages
PSL Help
100% (1)
PSL Help
58 pages
Ibarreta-Presentor No.17
No ratings yet
Ibarreta-Presentor No.17
29 pages
Public Administration
No ratings yet
Public Administration
73 pages
stock list of guyana stock
No ratings yet
stock list of guyana stock
2 pages
Uson vs. Del Rosario, G.R. No. L-4963 January 29, 1953 Facts
No ratings yet
Uson vs. Del Rosario, G.R. No. L-4963 January 29, 1953 Facts
1 page
AJAY Chhattisgarh - Company Final
No ratings yet
AJAY Chhattisgarh - Company Final
333 pages
Locum Tenens Guide - 080123
No ratings yet
Locum Tenens Guide - 080123
6 pages
FRM Notes
100% (3)
FRM Notes
76 pages
Pretest Letter To Parents For Scholastic
No ratings yet
Pretest Letter To Parents For Scholastic
2 pages
Principles of Finance
No ratings yet
Principles of Finance
2 pages
Big Blue CuZn Fast Flow Cartridges
No ratings yet
Big Blue CuZn Fast Flow Cartridges
2 pages
2023-BA-C4 - Elicitation and Collaboration
No ratings yet
2023-BA-C4 - Elicitation and Collaboration
75 pages
Ies Oradea
No ratings yet
Ies Oradea
50 pages
Group Assignment On Management Information Systems
No ratings yet
Group Assignment On Management Information Systems
12 pages
A Full-Scale Fluvial Flood Modelling Framework Based On A High-Performance Integrated Hydrodynamic Modelling System (HiPIMS)
No ratings yet
A Full-Scale Fluvial Flood Modelling Framework Based On A High-Performance Integrated Hydrodynamic Modelling System (HiPIMS)
42 pages
Adult Male Shirt Decals - Google Search
No ratings yet
Adult Male Shirt Decals - Google Search
1 page
Class 8 Maths Worksheet
No ratings yet
Class 8 Maths Worksheet
17 pages
WWW Who Int/workforcealliance/knowledge/toolkit/33 PDF
No ratings yet
WWW Who Int/workforcealliance/knowledge/toolkit/33 PDF
1 page
Before The Lights Go Out: A Survey of EMP Preparedness Reveals Significant Shortfalls
No ratings yet
Before The Lights Go Out: A Survey of EMP Preparedness Reveals Significant Shortfalls
15 pages
Copyright Transfer Form (IJNRD)
No ratings yet
Copyright Transfer Form (IJNRD)
1 page
Read Me - LHB Duronto Express
No ratings yet
Read Me - LHB Duronto Express
2 pages
Report autoDNA WBAVU71040KG92706 PDF
No ratings yet
Report autoDNA WBAVU71040KG92706 PDF
6 pages
Supply Chain Management
No ratings yet
Supply Chain Management
10 pages