0% found this document useful (0 votes)

90 views

2 Chapter 3 Query Optimization

The document discusses query processing and optimization. It covers the steps of query processing which include decomposition, optimization, and execution. Query optimization aims to choose the most efficient execution strategy from many alternatives to minimize resource usage like CPU and disk access. Heuristics and cost-based methods are common optimization techniques. Heuristics apply transformation rules to reduce intermediate result sizes, while cost-based methods evaluate physical execution costs.

Uploaded by

desalegn

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

90 views

2 Chapter 3 Query Optimization

Uploaded by

desalegn

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 29

Chapter Three

Query Processing and Optimization

22
2 5/ QUERY OPTIMIZATION 1
/
10
OUTLINE
I. Query Processing and Optimization: Why?

II. Steps of Processing

III. Methods of Optimization

 Heuristic (Logical Transformations)
 Transformation Rules
 Heuristic Optimization Guidelines
 Cost Based (Physical Execution Costs)
 Data Storage/Access Refresher

 Semantics based
22
2 5/ QUERY OPTIMIZATION 2
/
10
INTRODUCTION TO QUERY OPTIMISATION
With declarative languages such as SQL, the user specifies what data is required
 E.g. SELECT name FROM Staff
WHERE gender = ‘F’;
rather than how it is to be retrieved
This relieves the user of knowing what constitutes good execution strategy.

It also gives DBMS more control over system performance.

22
2 5/ QUERY OPTIMIZATION 3
/
10
 What is Query Processing?
 Steps required to transform high level SQL query into a correct and
“efficient” strategy for execution and retrieval.
 What is Query Optimization?
 The activity of choosing a single “efficient” execution strategy
(from hundreds) as determined by database catalog statistics.
 Which relational algebra expression, equivalent to the given query,
will lead to the most efficient solution plan?
 How do operations pass data (main memory buffer, disk buffer,…)?
 Will this plan minimize resource usage? (CPU/Response
Time/Disk)

22
2 5/ QUERY OPTIMIZATION 4
/
10
Example:
Identify all managers who work in a London branch
SELECT * FROM Staff s, Branch b
WHERE s.branchNo = b.branchNo AND s.position = ‘Manager’ AND
b.city = ‘london’;

Results in these equivalent relational algebra statements

(1)(position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)
(2) (position=‘Manager’)^(city=‘London’) (Staff Staff.branchNo = Branch.branchNo Branch)
(3) [(position=‘Manager’) (Staff)] Staff.branchNo = Branch.branchNo [(city=‘London’) (Branch)]
Assume:

– 1000 tuples in Staff.

– 50 Managers
– 50 tuples in Branch.
– 5 London branches
– No indexes or sort keys
22 – All temporary results are written back to disk (memory is small)
2 5/ QUERY OPTIMIZATION 5
/
10
– Tuples are accessed one at a time (not in blocks)
QUERY 1 (BAD)
(position=‘Manager’)^(city=‘London’)^(Staff.branchNo=Branch.branchNo) (Staff X Branch)
 Requires (1000+50) disk accesses to read from Staff and Branch relations
 Creates temporary relation of Cartesian Product-mathematical operation that
returns a set of from multiple set (1000*50) tuples
 Requires (1000*50) disk access to read in temporary relation and test
predicate
Total Work = (1000+50) + 2*(1000*50) =
101,050 I/O operations

Query 2 (Better)
(position=‘Manager’)^(city=‘London’) (Staff Staff.branchNo = Branch.branchNo Branch)
– Again requires (1000+50) disk accesses to read from Staff and Branch
– Joins Staff and Branch on branchNo with 1000 tuples
(1 employee : 1 branch )
– Requires (1000) disk access to read in joined relation and check predicate
Total Work = (1000+50) + 2*(1000) =
3050 I/O operations
22
3300% Improvement over Query 1
2 5/ QUERY OPTIMIZATION 6
/
10
QUERY 3 (BEST)
[ (position=‘Manager’) (Staff) ] Staff.branchNo = Branch.branchNo [ (city=‘London’) (Branch) ]

 Read Staff relation to determine ‘Managers’ (1000 reads)

 Create 50 tuple relation(50 writes)

 Read Branch relation to determine ‘London’ branches (50 reads)

 Create 5 tuple relation(5 writes)

 Join reduced relations and check predicate (50 + 5 reads)

Total Work = 1000 + 2*(50) + 5 + (50 + 5) =

1160 I/O operations

8700% Improvement over Query 1

22
2 5/ QUERY OPTIMIZATION 7
/
10
QUERY PROCESSING
STEPS

• Processing can be divided into :Decomposition, Optimization , and

Execution ,Code generation main categories
1. Query Decomposition
It is the process of transforming a high level query into a relational algebra
query, and to check that the query is syntactically and semantically
correct. It Consists of parsing and validation
10/25/22 QUERY OPTIMIZATION 8
TYPICAL STAGES IN QUERY DECOMPOSITION ARE:

i. Analysis: lexical and syntactical analysis of the

query(correctness) based on attributes, data type.. ,.
 Query tree will be built for the query containing leaf node for base
relations,
 one or many non-leaf nodes for relations produced by relational
algebra operations and
 root node for the result of the query.
 Sequence of operation is from the leaves to the root.
• (SELECT * FROM Catalog c ,Author a Where a.authorid = c.authorid
AND c.price>200 AND a.country= ‘ USA’ )
ii. Normalization: convert the query into a normalized
form.
 The predicate WHERE will be converted to Conjunctive (∨) or
Disjunctive (∧) Normal form.
22
2 5/ QUERY OPTIMIZATION 9
/
10
CONT… TYPICAL STAGES IN QUERY DECOMPOSITION ARE:

iii. Semantic Analysis: to reject normalized queries that

are not correctly formulated or contradictory.
 Incorrect if components do not contribute to generate result.
 Contradictory if the predicate can not be satisfied by any tuple. Say
for example,(gender=“m”  gender= “f”) since a given book author
can only be classified in either of the category at a time
iv. Simplification: to detect redundant qualifications,
eliminate common sub-expressions, and transform
the query to a semantically equivalent but more
easily and effectively computed form.
 For example, If a user don’t have the necessary access to all of the
objects of the query , it should be rejected.

22
2 5/ QUERY OPTIMIZATION 10
/
10
2. Query Optimization
 Everyone wants the performance of their database to be optimal. In
particular, there is often a requirement for a specific query or object
that is query based, to run faster.
 Problem of query optimization is to find the sequence of steps that
produces the answer to user request in the most efficient manner,
given the database structure.
 The performance of a query is affected by the tables or queries that
underlies the query and by the complexity of the query.
 Given a request for data manipulation or retrieval, an optimizer
will choose an optimal plan for evaluating the request from among
the manifold alternative strategies. i.e. there are many ways (access
paths) for accessing desired file/record.
 hence ,DBMS is responsible to pick the best execution strategy
based on various considerations( Least amount of I/O and CPU resources. )

22
2 5/ QUERY OPTIMIZATION 11
/
10
22
2 5/ QUERY OPTIMIZATION 12
/
10
22
2 5/ QUERY OPTIMIZATION 13
/
10
22
2 5/ QUERY OPTIMIZATION 14
/
10
22
2 5/ QUERY OPTIMIZATION 15
/
10
A .USING HEURISTICS IN QUERY OPTIMIZATION

 Query block: The basic unit that can be translated into the algebraic
operators and optimized.
 A query block contains a single SELECT-FROM-WHERE expression,
as well as GROUP BY and HAVING clause if these are part of the block.
 Nested queries within a query are identified as separate query blocks.

 Process for heuristics optimization

1. The parser of a high-level query generates an initial internal
representation;
2. Apply heuristics rules to optimize the internal representation.
3. A query execution plan is generated to execute groups of
operations based on the access paths available on the files
involved in the query.
 The main heuristic is to apply first the operations that reduce
the size of intermediate results.
 E.g. Apply SELECT and PROJECT operations before applying
the JOIN or other binary operations.
22
2 5/ QUERY OPTIMIZATION 16
/
10
• Query tree:
– A tree data structure that corresponds to a relational algebra
expression.
– It represents the input relations of the query as leaf nodes of the
tree, and represents the relational algebra operations as internal
nodes.
• Query graph:
– A graph data structure that corresponds to a relational calculus
expression. It does not indicate an order on which operations to
perform first. There is only a single graph corresponding to each
query.
 Example:
 For every project located in ‘Stafford’, retrieve the project number, the controlling
department number and the department manager’s last name, address and birthdate.
 Relation algebra:

 SQL query:
Q2: SELECT P.NUMBER, P.DNUM, E.LNAME, E.ADDRESS, E.BDATE
FROM PROJECT AS P ,DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND
/2
2 D.MGRSSN=E.SSN AND
17
10/2
5 P.PLOCATION=‘STAFFORD’; QUERY OPTIMIZATION
22
2 5/ QUERY OPTIMIZATION 18
/
10
Q2: SELECT P.NUMBER, P.DNUM, E.LNAME, E.ADDRESS, E.BDATE
FROM PROJECT AS P ,DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.DNUM=D.DNUMBER AND
D.MGRSSN=E.SSN AND
P.PLOCATION=‘STAFFORD’;

22
2 5/ QUERY OPTIMIZATION 19
/
10
22
2 5/ QUERY OPTIMIZATION 20
/
10
Heuristic Optimization of Query Trees:
 The same query could correspond to many different relational algebra
expressions — and hence many different query trees.
 The task of heuristic optimization of query trees is to find a final
query tree that is efficient to execute.
Example:
Q: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND ESSN=SSN
AND BDATE > ‘1957-12-31’;

22
2 5/ QUERY OPTIMIZATION 21
/
10
22
2 5/ QUERY OPTIMIZATION 22
/
10
22
2 5/ QUERY OPTIMIZATION 23
/
10
Which tree is better?

 brName  brName

|X| b.branchNo = s.branchNo |X| b.branchNo = s.branchNo

Branch
 s.name = ‘Mary’ Branch
 s.name = ‘Mary’

Staff
Staff

22
2 5/ QUERY OPTIMIZATION 24
/
10
Summary of Heuristics for Algebraic Optimization:
1. The main heuristic is to apply first the operations that reduce
the size of intermediate results.
2. Perform select operations as early as possible to reduce the
number of tuples and perform project operations as early as
possible to reduce the number of attributes. (This is done by
moving select and project operations as far down the tree as possible.)

3. The select and join operations that are most restrictive should
be executed before other similar operations. (This is done by
reordering the leaf nodes of the tree among themselves and
adjusting the rest of the tree appropriately.)

22
2 5/ QUERY OPTIMIZATION 25
/
10
B. Cost Estimation Approach to Query Optimization
The main idea is to minimize the cost of processing a query. The cost
function is comprised of:
I/O cost + CPU processing cost + communication cost + Storage cost
These components might have different weights in different
processing environments
The DBMs will use information stored in the system catalogue for
the purpose of estimating cost.
The main target of query optimization is to minimize the size of the
intermediate relation. The size will have effect in the cost of:
 Disk Access
 Data Transportation
 Storage space in the Primary Memory
 Writing on Disk

22
2 5/ QUERY OPTIMIZATION 26
/
10
1. Access Cost of Secondary Storage
Data is going to be accessed from secondary storage, as a query will be
needing some part of the data stored in the database. The disk access
cost can again be analyzed in terms of:
 Searching
 Reading, and
 Writing, data blocks used to store some portion of a relation.
Remark: The disk access cost will vary depending on
 The file organization used and the access method implemented for the file
organization.
 whether the data is stored contiguously or in scattered manner, will affect the
disk access cost.

2. Storage Cost
•While processing a query, as any query would be composed of many database
operations, there could be one or more intermediate results before reaching the
final output. These intermediate results should be stored in primary memory for
further processing. The bigger the intermediate relation, the larger the memory
requirement, which will have impact on the limited available space.
22
2 5/ QUERY OPTIMIZATION 27
/
10
3. Computation Cost
Query is composed of many operations. The operations could be
database operations like reading and writing to a disk, or
mathematical and other operations like:
 Searching
 Sorting
 Merging
 Computation on field values

4. Communication Cost
o In most database systems the database resides in one station and
various queries originate from different terminals. This will have impact
on the performance of the system adding cost for query processing. Thus,
the cost of transporting data between the database site and the terminal
from where the query originate should be analyzed.

22
2 5/ QUERY OPTIMIZATION 28
/
10
3. Query Execution Plans
– An execution plan for a relational algebra query consists of a
combination of the relational algebra query tree and information
about the access methods to be used for each relation as well as the
methods to be used in computing the relational operators stored in
the tree.

22
2 5/ QUERY OPTIMIZATION 29
/
10

Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Identity and Access Management Policy
100% (3)
Identity and Access Management Policy
4 pages
Assignment-4 Database Design - Models
50% (2)
Assignment-4 Database Design - Models
11 pages
Functional Dependencies and Normalization
No ratings yet
Functional Dependencies and Normalization
7 pages
Unit 1 CYBER SECURITY (AUC-002)
No ratings yet
Unit 1 CYBER SECURITY (AUC-002)
49 pages
Data Structures and Algorithms Assig 1 Fall21
0% (1)
Data Structures and Algorithms Assig 1 Fall21
10 pages
DNS PDF
No ratings yet
DNS PDF
13 pages
Chapter - 1 - Query Optimization
No ratings yet
Chapter - 1 - Query Optimization
38 pages
CH - 2 Query Process
No ratings yet
CH - 2 Query Process
44 pages
Query Processing
No ratings yet
Query Processing
5 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
42 pages
Distributed Query Processing
No ratings yet
Distributed Query Processing
24 pages
DBMS
No ratings yet
DBMS
240 pages
Chapter 1 Introduction - Review Questions
No ratings yet
Chapter 1 Introduction - Review Questions
82 pages
Distributed Database Design
No ratings yet
Distributed Database Design
52 pages
Dbms Quiz
No ratings yet
Dbms Quiz
13 pages
Presentation9 - Query Processing and Query Optimization in DBMS
No ratings yet
Presentation9 - Query Processing and Query Optimization in DBMS
36 pages
Multiple Choice Questions: Principles of Database Management
No ratings yet
Multiple Choice Questions: Principles of Database Management
8 pages
DFD Tutorial
100% (1)
DFD Tutorial
27 pages
Lab 1 Introduction To PROLOG
No ratings yet
Lab 1 Introduction To PROLOG
4 pages
Fundamentals of Database Systems: Lesson 1: Introduction
No ratings yet
Fundamentals of Database Systems: Lesson 1: Introduction
35 pages
C++ Classes and Objects Notes
No ratings yet
C++ Classes and Objects Notes
11 pages
05 - Strategies For Query Processing (Ch18)
No ratings yet
05 - Strategies For Query Processing (Ch18)
50 pages
Procedural Programming
No ratings yet
Procedural Programming
4 pages
Ooad Lab Manual
100% (1)
Ooad Lab Manual
29 pages
DBMS Unit1
No ratings yet
DBMS Unit1
48 pages
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
No ratings yet
Principles of Distributed Database Systems: M. Tamer Özsu Patrick Valduriez
76 pages
MCSE 204: Adina Institute of Science & Technology
No ratings yet
MCSE 204: Adina Institute of Science & Technology
16 pages
Question and Answer Set of System Analysis and
100% (1)
Question and Answer Set of System Analysis and
169 pages
Sonali DBMS Notes
100% (13)
Sonali DBMS Notes
61 pages
Introduction To Python Programming: Dr. R. Rajeswara Rao Professor & Head Dept. of CSE Jntuk-Ucev Vizianagaram
No ratings yet
Introduction To Python Programming: Dr. R. Rajeswara Rao Professor & Head Dept. of CSE Jntuk-Ucev Vizianagaram
27 pages
250+ TOP MCQs On SQL Queries and Answers - Quiz
No ratings yet
250+ TOP MCQs On SQL Queries and Answers - Quiz
1 page
Database Assignment
100% (1)
Database Assignment
82 pages
Introduction To Programming and Problem Solving
No ratings yet
Introduction To Programming and Problem Solving
33 pages
C# 04 - Delegates Lambdas and Events
No ratings yet
C# 04 - Delegates Lambdas and Events
11 pages
Study Material INTRODUCTION To Information Systems
No ratings yet
Study Material INTRODUCTION To Information Systems
5 pages
M03 Diagnosing Electronically Controlled PowerTrain Management
No ratings yet
M03 Diagnosing Electronically Controlled PowerTrain Management
62 pages
Chapter 3 - Simple Sorting and Searching
100% (1)
Chapter 3 - Simple Sorting and Searching
18 pages
CS504MCQs - Final - Bydr - TariqHanif PDF
0% (1)
CS504MCQs - Final - Bydr - TariqHanif PDF
21 pages
Compiler Design LAB MANUAL
No ratings yet
Compiler Design LAB MANUAL
10 pages
Mathematical-Economics Solved MCQs (Set-4)
No ratings yet
Mathematical-Economics Solved MCQs (Set-4)
8 pages
Principal of Programming Language
No ratings yet
Principal of Programming Language
67 pages
Final Python Question Bank
No ratings yet
Final Python Question Bank
457 pages
Unit-5 Control Statements
No ratings yet
Unit-5 Control Statements
16 pages
Data Structures Using C++ by DR Varsha Patil: Oxford University Press © 2012
No ratings yet
Data Structures Using C++ by DR Varsha Patil: Oxford University Press © 2012
49 pages
Questions and Answers
No ratings yet
Questions and Answers
26 pages
DBMS Interview Questions (Codeofgeeks) PDF
No ratings yet
DBMS Interview Questions (Codeofgeeks) PDF
16 pages
PHP Assignment
0% (1)
PHP Assignment
37 pages
Introduction To Data Engineering
No ratings yet
Introduction To Data Engineering
8 pages
Oop Practical Questions
No ratings yet
Oop Practical Questions
2 pages
Dbms Question Bank2 Marks 16 Marks
No ratings yet
Dbms Question Bank2 Marks 16 Marks
31 pages
Software Metrics-3
No ratings yet
Software Metrics-3
19 pages
C# Questions & Answers - Integer Data Types - Sanfoundry
No ratings yet
C# Questions & Answers - Integer Data Types - Sanfoundry
7 pages
SQL Material PDF
100% (1)
SQL Material PDF
55 pages
Prolog PPT Updated
No ratings yet
Prolog PPT Updated
45 pages
Solution To Assi1
No ratings yet
Solution To Assi1
8 pages
Cse3002 Internet-And-Web-Programming Eth 1.1 47 Cse3002
No ratings yet
Cse3002 Internet-And-Web-Programming Eth 1.1 47 Cse3002
2 pages
Computer Basics
100% (1)
Computer Basics
26 pages
Data Mining-Outlier Analysis
No ratings yet
Data Mining-Outlier Analysis
6 pages
Data Communication Basics CH 2
No ratings yet
Data Communication Basics CH 2
36 pages
Chapter 4 Query Optimization
100% (2)
Chapter 4 Query Optimization
35 pages
Aptos Whitepaper
100% (1)
Aptos Whitepaper
17 pages
BInDM Demo
No ratings yet
BInDM Demo
5 pages
5 Data Centric Engineering
No ratings yet
5 Data Centric Engineering
23 pages
LTMOM
No ratings yet
LTMOM
171 pages
Dbe PDF
No ratings yet
Dbe PDF
13 pages
Quantifying The Cumulative Impact of Change Orders
No ratings yet
Quantifying The Cumulative Impact of Change Orders
26 pages
UNIT 1 PPT 1
No ratings yet
UNIT 1 PPT 1
27 pages
Dissertation Submitted For The Bachelor of Business Administration (BBA)
No ratings yet
Dissertation Submitted For The Bachelor of Business Administration (BBA)
5 pages
Pgadmin Cheat Sheet
No ratings yet
Pgadmin Cheat Sheet
9 pages
Two Sample T Right Tailed Equal Variance Template
No ratings yet
Two Sample T Right Tailed Equal Variance Template
3 pages
75rt Ad Ccpulse+
No ratings yet
75rt Ad Ccpulse+
94 pages
Unit 2
No ratings yet
Unit 2
15 pages
Backup +recovery
No ratings yet
Backup +recovery
13 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
2 pages
CS Class 12 Project
100% (1)
CS Class 12 Project
34 pages
Sample Question - Lab Product Testing SQA Testing
No ratings yet
Sample Question - Lab Product Testing SQA Testing
2 pages
Bike Rental Management System: Submitted in Partial Fulfillment For The Requirement of 5 Semester For The
0% (1)
Bike Rental Management System: Submitted in Partial Fulfillment For The Requirement of 5 Semester For The
53 pages
Statistical-Techniques (1)
No ratings yet
Statistical-Techniques (1)
49 pages
ETL Testing Interview Questions and Answers PDF
No ratings yet
ETL Testing Interview Questions and Answers PDF
5 pages
NAL Algorithmic Trading and Computational Finance Using Python Brochure 30 PDF
No ratings yet
NAL Algorithmic Trading and Computational Finance Using Python Brochure 30 PDF
5 pages
Bus System Stallion
No ratings yet
Bus System Stallion
51 pages
Students Attendance Management System: Sarvesh Umashankar Pathak
No ratings yet
Students Attendance Management System: Sarvesh Umashankar Pathak
66 pages
DF100 - 04 - Storage and Retrieval With Arrays
No ratings yet
DF100 - 04 - Storage and Retrieval With Arrays
41 pages
Practice Questions1
No ratings yet
Practice Questions1
18 pages
Harshwardhan.M CV
No ratings yet
Harshwardhan.M CV
4 pages
SQL Notes Basic To Advanced (SQL Clauses)
No ratings yet
SQL Notes Basic To Advanced (SQL Clauses)
10 pages
Chapter 3 Numerical Descriptive Measures
No ratings yet
Chapter 3 Numerical Descriptive Measures
7 pages
Course 3 1 Big SQL
No ratings yet
Course 3 1 Big SQL
49 pages
Case Study OOSE
No ratings yet
Case Study OOSE
34 pages