0% found this document useful (0 votes)

12 views

Chapter 2-Query Processing and Optimi

i need it for purpose

Uploaded by

chuol lock

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Chapter 2-Query Processing and Optimi

i need it for purpose

Uploaded by

chuol lock

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 43

Advanced Database Systems

Chapter 2 :Query Processing and Optimization

1
Outline

1. Translating SQL Queries into Relational Algebra

2. Basic Algorithms for Executing Query Operations
3. Semantic Query Optimization
4. Using Heuristic in Query Optimization
5. Using Selectivity and Cost Estimates in Query Optimization

2
 What is Query Processing?
– Steps required to transform high level SQL query into a correct
and “efficient” strategy for execution and retrieval.
 What is Query Optimization?
– The activity of choosing a single “efficient” execution strategy
(from hundreds) as determined by database catalog statistics.
– Which relational algebra expression, equivalent to the given
query, will lead to the most efficient solution plan?
– How do operations pass data (main memory buffer, disk buffer,
…)?
– Will this plan minimize resource usage? (CPU/Response
Time/Disk)

3
 Relational Algebra in DBMS
– Relational Algebra is a procedural query language.
Relational algebra mainly provides a theoretical foundation
for relational databases and SQL.
– The main purpose of using Relational Algebra is to define
operators that transform one or more input relations into an
output relation.
– Given that these operators accept relations as input and
produce relations as output, they can be combined and used
to express potentially complex queries that transform
potentially many input relations (whose data are stored in
the database) into a single output relation (the query
results).

4
 Relational Algebra in DBMS
Fundamental Operators
These are the basic/fundamental operators used
in Relational Algebra.
–Selection(σ)
–Projection(π)
–Union(U)
–Set Difference(-)
–Set Intersection(∩)
–Rename(ρ)
–Cartesian Product(X)

5
 Relational Algebra in DBMS
Fundamental Operators
1. Selection(σ): It is used to select required tuples
of the relations. Example:
For the above relation, σ(c>3)R will select
the tuples which have c more than 3.

6
 Relational Algebra in DBMS
Fundamental Operators
2. Projection(π): It is used to project required
column data from a relation.
Example: Consider Table 1. Suppose we want
columns B and C from Relation R.
π(B,C)R will show following columns.

Note: By Default, projection

removes duplicate data.

7
 Relational Algebra in DBMS
Fundamental Operators
3. Union(U): Union operation in relational algebra
is the same as union operation in set theory.
Example: Consider the following table of
Students having different optional
FRENCH
GERMAN subjects in their course.
π(Student_Name)FRENCH U
π(Student_Name)GERMAN

8
 Relational Algebra in DBMS
Fundamental Operators
• 4. Set Difference(-): Set Difference in relational
algebra is the same set difference operation as
in set theory.
• Example: From the above table of FRENCH and
GERMAN, Set Difference is used as follows
π(Student_Name)FRENCH -
π(Student_Name)GERMAN

9
 Relational Algebra in DBMS
Fundamental Operators
• 5. Set Intersection(∩): Set Intersection in
relational algebra is the same set intersection
operation in set theory.
• Example: From the above table of FRENCH and
GERMAN, the Set Intersection is used as
follows
• π(Student_Name)FRENCH ∩
π(Student_Name)GERMAN

10
 Relational Algebra in DBMS
Fundamental Operators
• 6. Rename(ρ): Rename is a unary operation
used for renaming attributes of a relation.
• ρ(a/b)R will rename the attribute 'b' of the
relation by 'a'.

11
 Relational Algebra in DBMS
Fundamental Operators
• 7. Cross Product(X): Cross-product between two
relations. Let’s say A and B, so the cross product
between A X B will result in all the attributes of A
followed by each attribute of B. Each record of A
will pair with every record of B.
• Example:

12
 Relational Algebra in DBMS
Derived Operators
• These are some of the derived operators,
which are derived from the fundamental
operators.
–Natural Join(⋈)
–Conditional Join

13
 Relational Algebra in DBMS
Derived Operators
• These are some of the derived operators, which
are derived from the fundamental operators.
–Natural Join(⋈)
–Conditional Join
1. Natural Join(⋈): Natural join is a binary
operator. Natural join between two or more
relations will result in a set of all combinations
of tuples where they have an equal common
attribute.
14
 Relational Algebra in DBMS
Derived Operators
• Natural Join(⋈) Example

Natural join between EMP and DEPT with

condition :
EMP.Dept_Name = DEPT.Dept_Name

15
 Relational Algebra in DBMS
Derived Operators
• 2. Conditional Join: Conditional join works
similarly to natural join.
• In natural join, by default condition is equal
between common attributes while in
conditional join we can specify any condition
such as greater than, less than, or not equal.

16
 Relational Algebra in DBMS
Join between R and S with
Derived Operators condition R.marks >= S.marks
• Conditional Join Example

17
Query Optimization
• Activity of choosing an efficient execution
strategy for processing query.
• As there are many equivalent
transformations of same high-level query,
aim of QO is to choose one that minimizes
resource usage.
• Generally, reduce total execution time of
query. and may also reduce response
time of query.
• Problem computationally intractable with
large number of relations, so strategy
adopted is reduced to finding near 18
An Example (Branch and Staff Relations)

19
Example:
Identify all managers who work in a London branch
SELECT * FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND
(s.position = ‘Manager’ AND b.city = ‘london’);
Results in these equivalent relational algebra statements

(1) (position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)

(2) (position=‘Manager’)^(city=‘London’) (Staff  Branch)
Staff.branchNo = Branch.branchNo

(3) (position=‘Manager’ (Staff)) Staff.branchNo = Branch.branchNo (  city=‘London’ (Branch)

Assume:

– 1000 tuples in Staff.

– 50 Managers
– 50 tuples in Branch.
– 5 London branches
– No indexes or sort keys
– Results of any intermediate operations stored on disk; 20
Query 1 (Bad)
(position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)
– Requires (1000+50) disk accesses to read from Staff and Branch
relations
– Creates temporary relation of Cartesian Product (1000*50) tuples
– Requires (1000*50) disk access to read in temporary relation and test
predicate
Total Work = (1000+50) + 2*(1000*50)
= 101,050 I/O operations
Query 2 (Better)
 (position=‘Manager’)^(city=‘London’) (Staff Staff.branchNo = Branch.branchNo Branch)
– Again requires (1000+50) disk accesses to read from Staff and Branch
– Joins Staff and Branch on branchNo with 1000 tuples
(1 employee : 1 branch )
– Requires (1000) disk access to read in joined relation and check predicate
Total Work = (1000+50) + 2*(1000)
= 3050 I/O operations

21
Query 3 (Best)


(position=‘Manager’ (Staff)) Staff.branchNo = Branch.branchNo ( city=‘London’ (Branch))

– Read Staff relation to determine ‘Managers’ (1000 reads)

• Create 50 tuple relation(50 writes)

– Read Branch relation to determine ‘London’ branches (50 reads)

• Create 5 tuple relation(5 writes)

– Join reduced relations and check predicate (50 + 5 reads)

Total Work = 1000 + 2*(50) + 5 + (50 + 5)

= 1160 I/O operations

22
Dynamic versus Static Optimization
• Two choices when first three phases of QP can be
carried out:
1. Dynamically every time query is run.
 Advantages if dynamic QO arise from fact that information is up-
to-date.
 Disadvantages are that performance of query is affected, time
may limit finding optimum strategy.
2. Statically when query is first submitted.
 Advantages of static QO are removal of runtime overhead more
time to find optimum strategy.
 Disadvantages arise from fact that chosen execution strategy may
no longer be optimal when query is run.
 Could use a hybrid approach to overcome this

23
Query Processing Steps

• Processing can be divided into :Decomposition, Optimization , and Code

generation & Execution 24
Query Processing Steps
1. Query Decomposition
•It is the process of transforming a high level query into a
relational algebra query, and to check that the query is
syntactically and semantically correct.
•Typical stages in query decomposition are:
 Analysis
 Normalization
 Semantic Analysis
 Simplification
 Query Restructuring

25
Query Processing Steps
1. Query Decomposition (Analysis)
Analyze query lexically and syntactically using compiler
techniques.
 Verify relations and attributes exist.
 Verify operations are appropriate for object type.
 EX: SELECT staff_no FROM Staff
WHERE position > 10;
 This query would be rejected on two grounds:
 Staff_no is not defined for Staff relation (should be
StaffNo).
 Comparison ‘>10’ is incompatible with type‘position’,
which is variable character string. 26
Query Processing Steps
1. Query Decomposition (Analysis)
•Finally, query transformed into some internal representation
more suitable for processing.
•Some kind of query tree is typically chosen, constructed as
follows:
 Leaf node created for each base relation.
 Non-leaf node created for each intermediate relation
produced by RA operation.
 Root of tree represents query result.
 Sequence is directed from leaves to root

27
Query Processing Steps
1. Query Decomposition (Relational Algebra Tree)

28
Query Processing Steps
1. Query Decomposition (Normalization)
•Converts query into a normalized form for easier
manipulation.
•Predicate can be converted into one of two forms:
• Conjunctive normal form:

• Disjunctive normal form:

29
Query Processing Steps
1. Query Decomposition (Semantic Analysis)
 Rejects normalized queries that are incorrectly formulated
or contradictory.
 Query is incorrectly formulated if components do not
contribute to generation of result.
 Query is contradictory if its predicate cannot be satisfied
by any tuple.
 Algorithms to determine correctness exist only for queries
that do not contain disjunction and negation.

30
Query Processing Steps
1. Query Decomposition (Semantic Analysis)
For these queries, could construct:
 A relation connection graph.
 Normalized attribute connection graph.
 Relation connection graph
 Create node for each relation and node for result.
Create edges between two nodes that represent a join, and edges
between nodes that represent projection.
 If not connected, query is incorrectly formulated.

31
Query Processing Steps
1. Query Decomposition (Simplification)
 Simplification strategy:
 Detects redundant qualifications,
 Eliminates common sub-expressions,
 Transforms query to semantically equivalent but more
easily and efficiently computed form.
 Typically, access restrictions, view definitions, and
integrity constraints are considered.
 Assuming user has appropriate access privileges, first
apply well-known idempotency rules of Boolean algebra.

32
2. Query Optimization
 Everyone wants the performance of their database to be optimal. In particular,
there is often a requirement for a specific query or object that is query based, to
run faster.
 Problem of query optimization is to find the sequence of steps that produces
the answer to user request in the most efficient manner, given the database
structure.
 The performance of a query is affected by the tables or queries that underlies
the query and by the complexity of the query.
 Given a request for data manipulation or retrieval, an optimizer will choose an
optimal plan for evaluating the request from among the manifold alternative
strategies. i.e. there are many ways (access paths) for accessing desired
file/record.
 hence ,DBMS is responsible to pick the best execution strategy based on
various considerations( Least amount of I/O and CPU resources. )

33
34
35
36
37
A .Using Heuristics in Query Optimization

 Query block: The basic unit that can be translated into the algebraic
operators and optimized.
 A query block contains a single SELECT-FROM-WHERE expression,
as well as GROUP BY and HAVING clause if these are part of the block.
 Nested queries within a query are identified as separate query blocks.

 Process for heuristics optimization

1. The parser of a high-level query generates an initial internal
representation;
2. Apply heuristics rules to optimize the internal representation.
3. A query execution plan is generated to execute groups of
operations based on the access paths available on the files
involved in the query.
 The main heuristic is to apply first the operations that reduce the size
of intermediate results.
– E.g. Apply SELECT and PROJECT operations before applying
the JOIN or other binary operations. 38
Summary of Heuristics for Algebraic Optimization:
1. The main heuristic is to apply first the operations that reduce
the size of intermediate results.
2. Perform select operations as early as possible to reduce the
number of tuples and perform project operations as early as
possible to reduce the number of attributes. (This is done by
moving select and project operations as far down the tree as
possible.)
3. The select and join operations that are most restrictive should
be executed before other similar operations. (This is done by
reordering the leaf nodes of the tree among themselves and
adjusting the rest of the tree appropriately.)

39
B. Cost Estimation Approach to Query Optimization
• The main idea is to minimize the cost of processing a query. The cost
function is comprised of:
• I/O cost + CPU processing cost + communication cost + Storage cost
• These components might have different weights in different
processing environments
• The DBMs will use information stored in the system catalogue for the
purpose of estimating cost.
• The main target of query optimization is to minimize the size of the
intermediate relation. The size will have effect in the cost of:
– Disk Access
– Data Transportation
– Storage space in the Primary Memory
– Writing on Disk
40
B. Cost Estimation Approach to Query Optimization
1. Access Cost of Secondary Storage
• Data is going to be accessed from secondary storage, as a query will be
needing some part of the data stored in the database. The disk access cost
can again be analyzed in terms of:
– Searching
– Reading, and
– Writing, data blocks used to store some portion of a relation.
• Remark: The disk access cost will vary depending on
– The file organization used and the access method implemented for the
file organization.
– whether the data is stored contiguously or in scattered manner, will
affect the disk access cost.
2. Storage Cost
•While processing a query, as any query would be composed of many database
operations, there could be one or more intermediate results before reaching the
final output. These intermediate results should be stored in primary memory for
further processing. The bigger the intermediate relation, the larger the memory
41
requirement, which will have impact on the limited available space.
B. Cost Estimation Approach to Query Optimization

3. Computation Cost
• Query is composed of many operations. The operations could be
database operations like reading and writing to a disk, or
mathematical and other operations like:
– Searching
– Sorting
– Merging
– Computation on field values

4. Communication Cost
o In most database systems the database resides in one station and
various queries originate from different terminals. This will have impact
on the performance of the system adding cost for query processing. Thus,
the cost of transporting data between the database site and the terminal
from where the query originate should be analyzed.
42
3. Query Generation and Execution Plans
– An execution plan for a relational algebra query consists of a
combination of the relational algebra query and information about
the access methods to be used for each relation as well as the
methods to be used in computing the relational operators.

Verizon 2022
100% (6)
Verizon 2022
4 pages
PRESENTATIONduties and Responsibilities of CRIME REGISTRARS With Hyperlink 4
100% (1)
PRESENTATIONduties and Responsibilities of CRIME REGISTRARS With Hyperlink 4
40 pages
Chapter - 1 - Query Optimization
No ratings yet
Chapter - 1 - Query Optimization
38 pages
SAP Fiori Analytical Apps For Sales (1BS) : Test Script SAP S/4HANA - 28-08-18
No ratings yet
SAP Fiori Analytical Apps For Sales (1BS) : Test Script SAP S/4HANA - 28-08-18
29 pages
Chapter 6 RelationalQueryLanguage
No ratings yet
Chapter 6 RelationalQueryLanguage
21 pages
Advanced Database Systems: Chapter 3:query Processing and Evaluation
100% (1)
Advanced Database Systems: Chapter 3:query Processing and Evaluation
36 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
DB Lec 19
No ratings yet
DB Lec 19
19 pages
Itm661 Lecture03 Part2 2015
No ratings yet
Itm661 Lecture03 Part2 2015
47 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
Relational Algebra
No ratings yet
Relational Algebra
17 pages
Chapter 4 Query Optimization
100% (2)
Chapter 4 Query Optimization
35 pages
DBMS 7
No ratings yet
DBMS 7
59 pages
Chapter 6-Relational Algebra and Relational Calculus (2)
No ratings yet
Chapter 6-Relational Algebra and Relational Calculus (2)
34 pages
DE_Module5_QueryOptimization
No ratings yet
DE_Module5_QueryOptimization
11 pages
Relational Algebra
No ratings yet
Relational Algebra
31 pages
Query Optimization
No ratings yet
Query Optimization
20 pages
Relational Algebra
No ratings yet
Relational Algebra
9 pages
Chapter 2 - Query Processing and Optimization
100% (1)
Chapter 2 - Query Processing and Optimization
28 pages
DB BRT Chapter 6 2015
No ratings yet
DB BRT Chapter 6 2015
41 pages
DBMS 7
No ratings yet
DBMS 7
59 pages
Query Processing
No ratings yet
Query Processing
66 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
63 pages
Relational Algebra
No ratings yet
Relational Algebra
67 pages
Chapter 2 Query processing and optimization [Autosaved]
No ratings yet
Chapter 2 Query processing and optimization [Autosaved]
35 pages
Unit 4 Dbms
No ratings yet
Unit 4 Dbms
106 pages
Chapter 6
No ratings yet
Chapter 6
49 pages
CPSC 421 Database Management Systems: Relational Algebra, More SQL, Repeat
No ratings yet
CPSC 421 Database Management Systems: Relational Algebra, More SQL, Repeat
17 pages
DBMS Lecture 5
No ratings yet
DBMS Lecture 5
57 pages
4 Chapter Four
No ratings yet
4 Chapter Four
34 pages
Query Processing and Optimization: Dessalegn Mequanint
No ratings yet
Query Processing and Optimization: Dessalegn Mequanint
31 pages
Lecture 10-Relational Algebra
No ratings yet
Lecture 10-Relational Algebra
42 pages
Chapter - Seven: The Relational Algebra and Relational Calculus
No ratings yet
Chapter - Seven: The Relational Algebra and Relational Calculus
25 pages
ADB Chapter 2
No ratings yet
ADB Chapter 2
40 pages
Chapter 6
No ratings yet
Chapter 6
50 pages
Ch-2 (B) Overview of Query Processing
No ratings yet
Ch-2 (B) Overview of Query Processing
73 pages
Lecture 04 Week 4
No ratings yet
Lecture 04 Week 4
46 pages
Chapter4 RelationalAlgebra
No ratings yet
Chapter4 RelationalAlgebra
15 pages
Data Communication Basics CH 2
No ratings yet
Data Communication Basics CH 2
36 pages
Chapter 1 Query Processing and Optimization
No ratings yet
Chapter 1 Query Processing and Optimization
108 pages
DBMS Chapter 7
No ratings yet
DBMS Chapter 7
5 pages
ADBMS Chapter One
No ratings yet
ADBMS Chapter One
21 pages
DBMS 7
No ratings yet
DBMS 7
59 pages
Relational Algebra 1
No ratings yet
Relational Algebra 1
32 pages
DBMS Unit 2
No ratings yet
DBMS Unit 2
30 pages
Ch-2 (A) Overview of Query Processing
No ratings yet
Ch-2 (A) Overview of Query Processing
48 pages
Unit 3 Relational Data Model
No ratings yet
Unit 3 Relational Data Model
14 pages
UNIT-3
No ratings yet
UNIT-3
20 pages
Ch-1- Query Processing and Optimization (2)
No ratings yet
Ch-1- Query Processing and Optimization (2)
39 pages
FALLSEM2023 24 - BCSE302L - TH - VL2023240100776 - 2023 06 16 - Reference Material I 2
No ratings yet
FALLSEM2023 24 - BCSE302L - TH - VL2023240100776 - 2023 06 16 - Reference Material I 2
41 pages
Relational Algebra and Calculus
No ratings yet
Relational Algebra and Calculus
39 pages
CH 11
No ratings yet
CH 11
19 pages
T4 Relational Algebra
No ratings yet
T4 Relational Algebra
30 pages
05 Query Processing and Optimization-TELU
No ratings yet
05 Query Processing and Optimization-TELU
56 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
90 pages
09 Relational Algebra
No ratings yet
09 Relational Algebra
62 pages
C817b299unit 2 - Relational Algebra
No ratings yet
C817b299unit 2 - Relational Algebra
20 pages
Relational Algebra
No ratings yet
Relational Algebra
67 pages
Relational Algebra & Calculus
No ratings yet
Relational Algebra & Calculus
42 pages
DBMS UNIT 3
No ratings yet
DBMS UNIT 3
68 pages
Query Optimization
No ratings yet
Query Optimization
5 pages
adbms-unit2
No ratings yet
adbms-unit2
20 pages
Oracle SQL and PL/SQL
From Everand
Oracle SQL and PL/SQL
Niraj Gupta
4.5/5 (8)
Semi Automatic Biochemistry Analyzer (With Coagulation & Incubator)
No ratings yet
Semi Automatic Biochemistry Analyzer (With Coagulation & Incubator)
2 pages
Dot Net Test Paper
No ratings yet
Dot Net Test Paper
4 pages
Prof DR MD Abdul Mottalib: Chapter 1-Introduction
No ratings yet
Prof DR MD Abdul Mottalib: Chapter 1-Introduction
35 pages
Manasa Tatavarthy: E-Mail: Mobile
No ratings yet
Manasa Tatavarthy: E-Mail: Mobile
4 pages
Multimeter TS-297/U TM 11-5500
100% (1)
Multimeter TS-297/U TM 11-5500
85 pages
One Shot Learning
No ratings yet
One Shot Learning
1 page
Message 38
No ratings yet
Message 38
3 pages
Android KMSG
No ratings yet
Android KMSG
587 pages
Versant Guide - Test Administrators Guide
100% (2)
Versant Guide - Test Administrators Guide
22 pages
Note Book Laptops: 1. HP 15 - Intel Celeron
No ratings yet
Note Book Laptops: 1. HP 15 - Intel Celeron
3 pages
Nutanix-NCP-EUC
No ratings yet
Nutanix-NCP-EUC
11 pages
Message Prioritization in Advanced Adapter Engine
No ratings yet
Message Prioritization in Advanced Adapter Engine
9 pages
Dinesh Kumar.R: 122/244, Asthandra Naicker Street. Singanallu R, Coimbatore - 641 005, Tamil Nadu. 9916339035
No ratings yet
Dinesh Kumar.R: 122/244, Asthandra Naicker Street. Singanallu R, Coimbatore - 641 005, Tamil Nadu. 9916339035
3 pages
Microsoft Hungerbox Customisation For Microsft
No ratings yet
Microsoft Hungerbox Customisation For Microsft
3 pages
Practical File CS 2023-24 - KV RAJAHMUNDRY
No ratings yet
Practical File CS 2023-24 - KV RAJAHMUNDRY
55 pages
4 Schemaa
No ratings yet
4 Schemaa
56 pages
Project Design Implementation and Evaluation - Asm3 - Thuannpgc00853
No ratings yet
Project Design Implementation and Evaluation - Asm3 - Thuannpgc00853
30 pages
HW 3
No ratings yet
HW 3
3 pages
CSV Full Document PDF
100% (2)
CSV Full Document PDF
39 pages
IMAGEnet I-Base en New CI
No ratings yet
IMAGEnet I-Base en New CI
6 pages
Zhiyi Gpon Catalogue
No ratings yet
Zhiyi Gpon Catalogue
24 pages
D C Chapter 03 Topic 51 (Network Performance)
No ratings yet
D C Chapter 03 Topic 51 (Network Performance)
20 pages
CAT 2023 Mock Test Navigation Guide
No ratings yet
CAT 2023 Mock Test Navigation Guide
10 pages
32-Bit Microprocessor (Stand-Alone) : Labvolt Series
No ratings yet
32-Bit Microprocessor (Stand-Alone) : Labvolt Series
5 pages
Z3WJG4VV9 Assignment ITDPA
No ratings yet
Z3WJG4VV9 Assignment ITDPA
10 pages
WAP Via CSD Settings For LG 500, 601: For Postpaid Enter Gprsmtnlmum For Prepaid Enter Gprsppsmum
No ratings yet
WAP Via CSD Settings For LG 500, 601: For Postpaid Enter Gprsmtnlmum For Prepaid Enter Gprsppsmum
15 pages
STK1365 Syntek
No ratings yet
STK1365 Syntek
18 pages