Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Advanced Database System Chapter Three Query Processing and Optimization

Chapter Three discusses query processing and optimization, detailing the steps involved in translating SQL into relational algebra, executing queries, and optimizing them for efficiency. It covers techniques such as decomposition, semantic analysis, and the use of heuristics and cost estimates in query optimization. Additionally, it emphasizes the importance of relational algebra in database design and the potential issues that arise from neglecting these concepts.

Uploaded by

yafaone62
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Advanced Database System Chapter Three Query Processing and Optimization

Chapter Three discusses query processing and optimization, detailing the steps involved in translating SQL into relational algebra, executing queries, and optimizing them for efficiency. It covers techniques such as decomposition, semantic analysis, and the use of heuristics and cost estimates in query optimization. Additionally, it emphasizes the importance of relational algebra in database design and the potential issues that arise from neglecting these concepts.

Uploaded by

yafaone62
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 94

Chapter

Three
Query processing
and Optimization

1
Outlines

• Overview of Query Processing


• Translating SQL to Relational Algebra
• Algorithms for Executing Query
• Using Heuristic in Query Optimization
• Cost Estimates in Query Optimization
• Semantic Query Optimization

Query processing and Optimization


Parsing checks the
query syntax to
Query processing and
scanner whether
determine
itidentifies
is formulatedthe
Optimization
validate
that all
checking
query tokens—
according to the
attribute
such rules
syntax as (rules
SQL
and relation
keywords,
of grammar) of the
names
attributearenames,
valid
query language.
andandsemantically
relation
meaningful
names—that names
inappear
the schema in theof
thetextparticular
of the query
database being
queried.

3
What is query processing?

 Query processing is the process of translating a high-level query,


such as SQL, into a low-level query that can be executed by the
database system.
• This involves parsing, validating, and optimizing the query, as well
as generating a query execution plan. A query execution plan is a
sequence of operations that the database system performs to retrieve
the data requested by the query.

Chapter Three- Query Processing and Optimization


Query processing

• Processing can be divided into : Decomposition,


Optimization, Execution, and Code generation
1. Query Decomposition
• It is the process of transforming a high level query into a
relational algebra query, and to check that the query is
syntactically and semantically correct. It Consists of
parsing and validation

6
Typical stages in query decomposition are:

i. Analysis: lexical and syntactical analysis of the


query(correctness) based on attributes, data type.. ,. Query
tree will be built for the query containing leaf node for
base relations, one or many non-leaf nodes for relations
produced by relational algebra operations and root node for
the result of the query. Sequence of operation is from the
leaves to the root.
(SELECT * FROM Catalog c ,Author a Where a.authorid
= c.authorid AND c.price>200 AND a.country= ‘ USA’ )
ii. Normalization: convert the query into a normalized form.
The predicate WHERE will be converted to Conjunctive (∨) or
Disjunctive (∧) Normal form.
7
iii. Semantic Analysis: to reject normalized queries that
are not correctly formulated or contradictory. Incorrect
if components do not contribute to generate result.
Contradictory if the predicate can not be satisfied by any
tuple. Say for example,(Catalog =“BS”  Catalog=
“CS”) since a given book can only be classified in either
of the category at a time
iv. Simplification: to detect redundant qualifications,
eliminate common sub-expressions, and transform the
query to a semantically equivalent but more easily and
effectively computed form. For example, If a user don’t
have the necessary access to all of the objects of the
query , it should be rejected.
8
2. Query Optimization
What is Query Optimization?

– The activity of choosing a single “efficient” execution


strategy (from hundreds) as determined by database
catalog statistics.
– Which relational algebra expression, equivalent to the
given query, will lead to the most efficient solution
plan?
– For each algebraic operator, what algorithm (of several
available) do we use to compute that operator?
– How do operations pass data (main memory buffer,
9
disk buffer,…)?
 Everyone wants the performance of their database to be
optimal. In particular, there is often a requirement for a
specific query or object that is query based, to run faster.
 Problem of query optimization is to find the sequence
of steps that produces the answer to user request in
the most efficient manner, given the database structure.
 The performance of a query is affected by the tables or
queries that underlies the query and by the complexity
of the query.
 10
Cont’d…

 Given a request for data manipulation or retrieval,


an optimizer will choose an optimal plan for
evaluating the request from among the manifold
alternative strategies. i.e. there are many ways
(access paths) for accessing desired file/record.
 hence ,DBMS is responsible to pick the best
execution strategy based on various
considerations( Least amount of I/O and CPU resources. )
Cont’d…

• A query typically has many possible execution


strategies, and the process of choosing a suitable
one for processing a query is known as query
optimization.
• Is not the optimal (or absolute best) strategy—it is
just a reasonably efficient strategy for executing
the query.

1
2
…continued
• There are two main techniques that are employed during
query optimization.
• The first technique is based on heuristic rules for ordering the
operations in a query execution strategy. A heuristic is a rule that
works well in most cases but is not guaranteed to work well in
every case. The rules typically reorder the operations in a query
tree.
• The second technique involves systematically estimating the
cost of different execution strategies and choosing the execution
plan with the lowest cost estimate. These techniques are usually
combined in a query optimizer.

1
3
…continued
 Example: Consider relations r(AB) and s(CD). We
require r X s.
 Method 1 :
a. Load next record of r in RAM.
b. Load all records of s, one at a time and
concatenate with r.
c. All records of r concatenated?
 NO: goto a.
 YES: exit (the result in RAM or on disk).
 Performance: Too many accesses.

14
…continued
 Method 2: Improvement
a. Load as many blocks of r as possible leaving
room for one block of s.
b. Run through the s file completely one block
at a time.
 Performance: Reduces the number of times s blocks are
loaded by a factor of equal to the number of r records than
can fit in main memory.
 Considerations during query Optimization:
– Narrow down intermediate result sets
quickly. SELECT and PROJECTION before
JOIN
1
– Use access structures (indexes). 5
Using Heuristics in Query Optimization

• In practice, SQL is the query language that is


used in most commercial RDBMSs. An SQL
query is first translated into an equivalent
extended relational algebra expression-
represented as a query tree data structure-
that is then optimized.
• Typically, SQL queries are decomposed into
query blocks, which form the basic units that
can be translated into the algebraic operators
and optimized.
How to Measure Query Performance?

• To measure the performance of your query execution plan, you need to use some
tools and metrics that can help you analyze and compare different plans.
 Explain is a command that shows the query execution plan and the estimated cost of
each operation.
 Profiling is a feature that shows the actual time and resources used by each
operation.
 Monitoring is a tool that tracks and displays the overall performance and health of
your database system, such as CPU, memory, disk, and network usage.
• Common metrics for measuring query performance include execution time, disk
I/O, memory usage, and network traffic.

Chapter Three- Query Processing and Optimization


How to Improve Query Performance?

• To improve the performance of your query execution plan,


applying techniques and best practices can help optimize your
query and database system. Indexing, partitioning, caching, and
tuning are some of the most common methods.
 Indexing creates and maintains indexes on columns used in
conditions, joins, or sorting to speed up data access and reduce
disk I/O.

Chapter Three- Query Processing and Optimization


Cont. …

 Partitioning divides a large table into smaller chunks based on criteria


such as date or range to limit scans to specific partitions and distribute
workloads across multiple servers.
 Caching stores frequently accessed data in a fast temporary storage
such as memory or SSD to reduce disk I/O and network traffic.
 Tuning adjusts the parameters and settings of your database system to
optimize its performance for your workload and environment, including
changing buffer size, cache size, concurrency level, or query
optimizer mode.
Chapter Three- Query Processing and Optimization
2
0
Query Processing and Relational
Algebra
• The relationship between query processing and relational algebra lies in the fact that
relational algebra provides a formal foundation for expressing queries, and query
processing involves transforming these algebraic expressions into an efficient
execution plan.
Relational Algebra
• Relational algebra is a formal mathematical language for expressing queries on
relational databases. It consists of a set of operators that operate on relations (tables)
to retrieve, filter, and combine data. The basic relational algebra operators include:

Chapter Three- Query Processing and Optimization


Cont. …

1. Selection (σ): Selects rows that satisfy a given predicate.

2. Projection (π): Retrieves specific columns from a relation.

3. Union (∪): Combines two relations to create a new relation with all unique rows.

4. Intersection (∩): Retrieves common rows between two relations.

5. Difference (-): Retrieves rows from one relation that are not present in another.

6. Cartesian Product (×): Combines all possible pairs of rows from two relations.

7. Join (⨝): Combines rows from two relations based on a specified condition.

Chapter Three- Query Processing and Optimization


Relational Algebra
• Relational algebra is a theoretical framework that provides a set of operations for
manipulating relations (tables) in a relational database. It serves as the
foundation for query languages like SQL. Key operations in relational algebra
include:
 Selection (σ): Selects rows from a relation that satisfy a given condition.
– Example: σ(Age > 25)(Employees)

 Projection (π): Selects specific columns from a relation, creating a new relation
with only those columns.
– Example: π(Name, Salary)(Employees)

Chapter Three- Query Processing and Optimization


Cont. …

 Union (∪): Combines two relations to create a new relation containing all unique
rows.
– Example: R ∪ S

 Intersection (∩): Returns a relation containing rows that appear in both input
relations.
– Example: R ∩ S

 Difference (-): Returns a relation containing rows that appear in the first input
relation but not in the second.
– Example: R - S

Chapter Three- Query Processing and Optimization


Cont. …

 Cartesian Product (×): Combines every row from the first


relation with every row from the second, creating a new
relation with all possible pairs.
 Example: R × S

 Join (⨝): Combines rows from two relations based on a


specified condition.
 Example: R ⨝<sub>A=B</sub> S

Chapter Three- Query Processing and Optimization


Cont. …

• Relational algebra operations are used to express queries and transformations


on relational databases in a formal and mathematical way.
• SQL, the most widely used query language for relational databases, is
influenced by relational algebra concepts.
• In practice, a DBMS translates SQL queries into a sequence of relational
algebra operations during query processing to retrieve the desired results.
• Understanding relational algebra is crucial for designing efficient queries and
optimizing database operations.

Chapter Three- Query Processing and Optimization


Why Relational Algebra in Database
• If relational algebra concepts are overlooked or not understood properly in database
design and management, several issues may arise:
 Inefficient Queries: Relational algebra forms the foundation of query languages like
SQL. Without a solid understanding of relational algebra, developers may write
inefficient queries that take longer to execute or consume more resources than
necessary.
 Incorrect Query Results: Incorrect application of relational algebra concepts can
lead to incorrect query results. For example, misunderstanding joins or projection
operations may result in missing or duplicated data in query results.

Chapter Three- Query Processing and Optimization


Cont. …

 Poor Database Design: Relational algebra helps in designing normalized


database schemas that reduce data redundancy and maintain data integrity.
• Ignoring relational algebra may result in poorly designed databases prone to
anomalies like update anomalies, insertion anomalies, and deletion anomalies.
 Difficulty in Optimization: Relational algebra provides a framework for
query optimization. Without understanding these concepts, developers may
struggle to optimize queries for better performance, leading to slower database
operations and decreased overall system efficiency.

Chapter Three- Query Processing and Optimization


Cont. …

 Limited Query Flexibility: Relational algebra operations provide a wide


range of capabilities for querying and manipulating data. Failure to grasp
these concepts may limit the types of queries that can be effectively
executed against the database.
 Maintenance Challenges: Databases designed without considering
relational algebra principles may be harder to maintain and evolve over
time. Changes to the database schema or queries may lead to unexpected
behavior or require significant rework.

Chapter Three- Query Processing and Optimization


Cont. …

 Reduced Interoperability: Relational algebra is the basis for


standard query languages like SQL. Without understanding
these foundational principles, developers may struggle to interact
with other databases or systems that use relational databases.
• Neglecting relational algebra in database design and
management can lead to inefficiencies, errors, and challenges in
query optimization, database maintenance, and interoperability.

Chapter Three- Query Processing and Optimization


Transformation rule for relational
algebra with example
2. Commutativity of
1. Cascade of SELECTION
SELECTION
Rule: Multiple SELECTION operations
Rule: The order of SELECTION
can be combined into a single
operations can be interchanged
SELECTION operation.
without affecting the result.

Example: Example:
 Initial Query:  Initial Query:

 Equivalent Query:
 Optimized Query:

Explanation: Instead of first Explanation: Whether you first


selecting employees with a salary select employees older than 30
greater than 50,000 and then or those in the HR department,
selecting those older than 30, you the final result will be the same.
can combine these conditions into
Transformation rule for relational
algebra with example….
4. Commutativity of SELECTION
3. Cascade of PROJECTION
with PROJECTION
Rule: In a sequence of
Rule: SELECTION and PROJECTION
PROJECTION operations, only
operations can be interchanged if
the last one is necessary.
the SELECTION predicate involves
only the attributes in the
Example: PROJECTION list.
 Initial Query:
Example:
  Initial Query:
Optimized Query:

 Equivalent Query:
Explanation: If you first project
the attributes name, age, and
salary, and then project only
Explanation: If you first project the
name and age, you can directly
attributes name and age and then
project name and age from the
select employees older than 30, or
start.
if you first select employees older
Transformation rule for relational
algebra with example….
5. Commutativity of THETA JOIN/Cartesian
Product
Rule: The THETA JOIN (⨝) and Cartesian Product
(×) operations are commutative, meaning the
order of the relations can be swapped without
affecting the result.

Example:
 Initial Query:
R×S
 Equivalent Query:
S×R
Explanation: Whether you join R with S or S with
R, the result will be the same set of tuples.
Transformation rule for relational
algebra with example….
6. Commutativity of SELECTION Case b: SELECTION
with THETA JOIN Predicate Involves
Rule: If the SELECTION predicate Attributes of Both
involves only attributes of one of Relations
the relations being joined, the
SELECTION and JOIN operations can Example:
be interchanged.
 Initial Query:

Case a: SELECTION Predicate


 Equivalent Query:
Involves Only Attributes of One
Relation
Example: Explanation: If c1 involves
 Initial Query: only attributes of R and c2
involves only attributes of S,
you can first select the tuples
 Equivalent Query: from R that satisfy c1 and the
Explanation: If the predicate c1 tuples from S that satisfy c2,
involves only attributes of R, you and then join the results.
Transformation rule for relational
algebra with example….
7. Commutativity of PROJECTION and THETA JOIN
Rule: If the projection list is of the form
L1, L2, where L1 involves only attributes of R and L2
involves only attributes of S being joined, and the predicate
θ involves only attributes in the projection list, then:

Example:
 Initial Query:

 Optimized Query:

Explanation: Instead of projecting the attributes after the


join, you can project the relevant attributes from each
relation before performing the join.
Transformation rule for relational
algebra with example….
8. Commutativity of the Set 9. Associativity of the THETA
Operations: UNION and JOIN, CARTESIAN PRODUCT,
INTERSECTION but not SET UNION, and INTERSECTION
DIFFERENCE Rule: These operations are
Rule: UNION and INTERSECTION associative.
operations are commutative, but
SET DIFFERENCE is not.
Explanation: The order in which
you perform the JOIN, CARTESIAN
PRODUCT, UNION, and
INTERSECTION does not affect the
final result.
Example:
 Initial Query:

 Optimized Query:

Explanation: The order of UNION


Transformation rule for relational
algebra with example….

10. Commuting SELECTION with SET OPERATIONS


Rule: SELECTION operations can commute with UNION and
INTERSECTION.

Example:

Explanation: Instead of applying the SELECTION after the


UNION, you can apply the SELECTION to each relation before
performing the UNION.
Transformation rule for relational
algebra with example….
11. Commuting PROJECTION with UNION
Rule: PROJECTION operations can commute with UNION.

Example:

Explanation: Instead of projecting the attributes after the


UNION, you can project the relevant attributes from each
relation before performing the UNION.
Translating SQL Queries into Relational Algebra

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


How to Translate Complex Queries

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


Cont. ..

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


Cont. …

Chapter Three- Query Processing and Optimization


55
Using Heuristics
Heuristic optimization in query processing
involves using rule-based techniques to
transform a query into a more efficient form.
Here’s a detailed explanation of the process:

Process for heuristics optimization


1.Initial Internal Representation:
 When a high-level query (like SQL)
is submitted, the parser translates
it into an initial internal
representation, often in the form of
a relational algebra tree. This tree
represents the logical steps
needed to execute the query.
Using Heuristics…

2. Applying Heuristic Rules:


o Heuristic rules are applied to this internal
representation to optimize it. These rules
are based on general principles that
typically lead to more efficient query
execution. Some common heuristic rules
include:
 Selection Pushdown: Moving selection
operations as close to the base relations
as possible to reduce the size of
intermediate results.
 Projection Pushdown: Moving
projection operations down the query
tree to eliminate unnecessary columns
early.
Cont’d…
 Join Reordering: Reordering join
operations to minimize the size of
intermediate results, often based
on the size of the relations
involved.
 Combining Operations: Merging
adjacent operations that can be
executed together more efficiently.
Using Heuristics…

3. Generating a Query Execution


Plan:
 After applying heuristic rules, the
optimized internal representation is used
to generate a query execution plan.
This plan outlines the specific steps and
methods the DBMS will use to execute
the query.
 The execution plan considers the access
paths available, such as indexes and
sequential scans, to determine the most
efficient way to retrieve and process the
data.
 The plan may include operations like
index scans, nested loop joins, hash
Using Heuristics…

 The main heuristic is to apply first the operations that reduce


the size of intermediate results.
– E.g. Apply SELECT and PROJECT operations
before applying the JOIN or other binary operations.

Intermediate results in the context of


database query processing are the temporary
data sets produced during the execution of a
query before arriving at the final result.
Intermediate results are not stored permanently
in the database. They exist only for the duration
of the query execution and are discarded once Sli
the final result is produced. de
15-
60
…continued
• Heuristics Approach uses the knowledge of the
characteristics of the relational algebra operations and
the relationship between the operators to optimize the
query.
• Thus the heuristic approach of optimization will make
use of:
– Properties of individual operators
– Association between operators
– Query Tree: a graphical representation of the operators,
relations, attributes and predicates and processing
sequence during query processing.
• It is composed of three main parts: 6
1
– Sequence of execution of operation in a query tree will
…continued

 Query block: The basic unit that can be translated into the
algebraic operators and optimized.
 A query block contains a single SELECT-FROM-WHERE
expression, as well as GROUP BY and HAVING clause if
these are part of the block.
 Nested queries within a query are identified as separate query
blocks.
 There are two types of nested queries: uncorrelated and
correlated. 6
2
Uncorrelated Nested Queries

Uncorrelated nested queries could be performed


separately and their results will be used in outer
query.

SELECT name
FROM employees
WHERE department_id IN (SELECT department_id
FROM departments WHERE location = 'New York’);

In this example, the inner query (SELECT department_id


FROM departments WHERE location = 'New York') is
executed first, and its result is used by the outer query to
filter employees.
Correlated Nested Queries
• Correlated nested queries need information (tuple
variable) from outer query in their execution.

SELECT name
FROM employees e
WHERE salary > (SELECT AVG(salary) FROM employees
WHERE department_id = e.department_id);

In this example, the inner query (SELECT AVG(salary) FROM


employees WHERE department_id = e.department_id) depends
on the department_id of each row in the outer query. Therefore,
the inner query is executed for each employee to compare their
salary with the average salary of their department.
• Query tree:
– A tree data structure that corresponds to a relational algebra
expression. It represents the input relations of the query as
leaf nodes of the tree, and represents the relational algebra
operations as internal nodes.
– Leafs: the base relations used for processing the
query/ extracting the required information
– Root: the final result/relation as an out put based on
the operation on the relations used for query
processing
– Nodes: intermediate results or relations before
reaching the final result.
• An execution of the query tree consists of executing an internal
node operation whenever its operands are available and then
replacing that internal node by the relation that results from
executing the operation. Sli
de
15-
65
Query graph

• A query graph is a visual representation used in database


theory to illustrate a relational calculus expression. Here’s
a breakdown of the key points:
 Graph Data Structure: The query graph is a type of graph
that visually represents the relationships and constraints of a
query.
 Relational Calculus Expression: It corresponds to a
relational calculus expression, which is a non-procedural
query language used to specify what data to retrieve rather
than how to retrieve it.
 No Operation Order: The graph does not specify the order
in which operations should be performed. It simply shows
the relationships and constraints.
6
 Uniqueness: Each query has a unique corresponding graph,6
meaning there is only one graph for each specific query
…continued
 Example:
• For every project located in ‘Stafford’, retrieve the project number,
the controlling department number and the department manager’s last
name, address and birthdate.
 Relation algebra:

πPNUMBER, DNUM, LNAME, ADDRESS, BDATE (((σPLOCATION=‘STAFFORD’(PROJECT))


DNUM=DNUMBER (DEPARTMENT)) MGRSSN=SSN (EMPLOYEE))
 SQL query:

SELECT P.NUMBER,P.DNUM,E.LNAME,E.ADDRESS,
E.BDATE FROM PROJECT AS P,DEPARTMENT AS D,
EMPLOYEE AS E WHERE P.DNUM=D.DNUMBER AND
D.MGRSSN=E.SSN AND P.PLOCATION=‘STAFFORD’;
6
7
Sli
de
15-
68
Sli
de
15-
69
…cont
Step 1. Perform Selection operation as early
as possible : By using selection operation at
early stages, you can reduce the unwanted
number of record or data, to transfer from
database to primary memory. Optimizer use
transformation rule 1 to divide selection
operations with conjunctive conditions into a
cascade of selection operations.
… cont

Step 2. Perform commutativity of selection


operation with other operations as early as
possible : Optimizer use transformation rule 2,
4, 6, and 9 to move selection operation as far
down the tree as possible and keep selection
predicates on the same relation together. By
keeping selection operation down at tree
reduces the unwanted data transfer and by
keeping selection predicates together on same
relations reduces the number of times of
database manipulation to retrieve records from
same database table.
… cont

Step 3. Combine the Cartesian Product with


subsequent selection operation whose predicates
represents a join condition into a JOIN operation :
Optimizer uses transformation rule 13 to convert a
selection and cartesian product sequence into join. It
reduces data transfer. It is always better to transfer
only required data from database instead of
transferring whole data and then refine it. (Cartesian
product combines all data of all the tables mention in
query while join operation retrieves only those records
from database that satisfy the join condition).
Step 4. Use Commutativity and Associativity of
Binary operations : Optimizer use transformation rules
5, 11, and 12 to execute the most restrictive
selection operations first.
Step 5. Perform projection operations as early as
possible : After performing selection operations,
optimizer use transformation rules 3, 4, 7 and 10 to
reduce the number of columns of a relation by
moving projection operations as far down the tree as
possible and keeping projection predicates on the
same relation together.
Step 6. Compute common expressions only once: It
is used to identify sub-trees that represent groups of
operations that can be executed by a single
algorithm.
• Heuristic Optimization of Query Trees:
– The same query could correspond to many
different relational algebra expressions — and
hence many different query trees.
– The task of heuristic optimization of query trees
is to find a final query tree that is efficient to
execute.
• Example:
Q2: SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME = ‘AQUARIUS’ AND
PNMUBER=PNO AND ESSN=SSN AND BDATE
> ‘1957-12-31’; Sli
de
15-
74
(a) Initial (canonical)
query tree for SQL
query Q.
Executing this tree directly
first creates a very large file
containing the CARTESIAN
PRODUCT of the entire
EMPLOYEE, WORKS_ON,
(b) Moving SELECT and PROJECT files.
operations down the
query tree.
an improved query tree that (c) Applying the more
first applies the SELECT restrictive SELECT
operations to reduce the operation first.
number of tuples that appear in A further improvement is achieved
by switching the positions of the
the CARTESIAN PRODUCT.
EMPLOYEE and PROJECT
relations in the tree, as shown in
(c).This uses the information that
Pnumber is a key attribute of theSli
PROJECT relation, and hence thede
SELECT operation on the 15-
PROJECT relation will retrieve a75
(d) Replacing CARTESIAN
PRODUCT and SELECT
with JOIN operations.
We can further improve the
query tree by replacing any
CARTESIAN PRODUCT
operation that is followed by a
join condition with a JOIN
operation
(e) Moving PROJECT
operations down the query
tree.
Another improvement is to keep
only the attributes needed by
subsequent operations in the
intermediate relations, by
including PROJECT (π) operations
as early as possible in the query Sli
de
tree, as shown in (e). This reduces 15-
the attributes (columns) of the 76
Summary of Heuristics for Algebraic Optimization:

1. The main heuristic is to apply first the operations that reduce the
size of intermediate results.

2. Perform select operations as early as possible to reduce the number


of tuples and perform project operations as early as possible to
reduce the number of attributes. (This is done by moving select
and project operations as far down the tree as possible.)
3. The select and join operations that are most restrictive should be
executed before other similar operations. (This is done by reordering
the leaf nodes of the tree among themselves and adjusting the rest of
the tree appropriately.)

Slide 15-
77
B. Cost Estimation Approach to Query Optimization
• The main idea is to minimize he cost of processing a query. The cost
function is comprised of:
• I/O cost + CPU processing cost + communication cost + Storage cost
• These components might have different weights in different
processing environments
• The DBMs will use information stored in the system catalogue for the
purpose of estimating cost.
• The main target of query optimization is to minimize the size of the
intermediate relation. The size will have effect in the cost of:
• Disk Access
• Data Transportation
• Storage space in the Primary Memory
• Writing on Disk

78
• Cost-based query optimization:
• Estimate and compare the costs of executing a
query using different execution strategies and
choose the strategy with the lowest cost estimate.
(Compare to heuristic query optimization)
• Issues
• Cost function
• Number of execution strategies to be considered
• Cost Components for Query Execution
1. Access cost to secondary storage
2. Storage cost
3. Computation cost
4. Memory usage cost
5. Communication cost
Sli
de
15-
79
1. Access Cost of Secondary Storage
• Data is going to be accessed from secondary storage, as a query will
be needing some part of the data stored in the database. The disk
access cost can again be analyzed in terms of:
– Searching
– Reading, and
– Writing, data blocks used to store some portion of a
relation.
• Remark: The disk access cost will vary depending on
– The file organization used and the access method
implemented for the file organization.
– whether the data is stored contiguously or in
scattered manner, will affect the disk access cost.
80
…continued

2. Storage Cost
• While processing a query, as any query would be composed of
many database operations, there could be one or more
intermediate results before reaching the final output. These
intermediate results should be stored in primary memory for
further processing. The bigger the intermediate relation, the
larger the memory requirement, which will have impact on
the limited available space. This will be considered as a cost
of storage.
8
1
3. Query Execution Plans

– An execution plan for a relational algebra


query consists of a combination of the
relational algebra query tree and
information about the access methods to be
used for each relation as well as the
methods to be used in computing the Sli
de
15-
82
4. Computation Cost
• Query is composed of many operations. The operations could be database
operations like reading and writing to a disk, or mathematical and other
operations like:
• Searching
• Sorting
• Merging
• Computation on field values
5. Communication Cost
• In most database systems the database resides in one
station and various queries originate from different
terminals. This will have impact on the performance
of the system adding cost for query processing. Thus,
the cost of transporting data between the database site
83
and the terminal from where the query originate
should be analyzed.
Semantic Query Optimization

• Semantic query optimization is a process in database management systems


(DBMS) that goes beyond traditional query optimization techniques based
on syntactic analysis. While traditional query optimization focuses on
transforming a query into an equivalent but more efficient form by considering
the query's syntax, semantic query optimization takes into account the
meaning or semantics of the query and the underlying data model.
• Here are some aspects and considerations associated with semantic query
optimization:

04/23/2025 Query processing and Optimization


Cont. …

 Query Rewrite: Semantic query optimization involves analyzing the


structure and meaning of the query to rewrite it in a more efficient way.
This may include identifying equivalent expressions or transformations
that can improve performance.
 Semantic Caching: By understanding the semantics of the query, the
system may employ caching mechanisms more effectively. Caching
results of semantically equivalent queries can lead to improved
performance by avoiding redundant computations.

04/23/2025 Query processing and Optimization


Cont. …

 Statistical Information: Semantic query optimization may take advantage of


statistical information about the data distribution and relationships between
tables. This information can be used to make informed decisions about query
execution plans.
 Schema Knowledge: Understanding the schema and relationships between
tables allows the optimizer to make intelligent decisions. Semantic
optimization considers foreign key relationships, data dependencies, and
constraints to optimize the query plan.

04/23/2025 Query processing and Optimization


Cont. …

 Semantic Indexing: The use of semantic indexing involves exploiting


the semantics of data to build more effective index structures. This can
lead to faster query execution times by leveraging the structure and
relationships within the data.
 Query Decomposition: Semantic optimization may involve breaking
down complex queries into simpler, semantically equivalent subqueries.
This decomposition allows the optimizer to consider different execution
strategies for each subquery.

04/23/2025 Query processing and Optimization


Cont. …

 Query Plan Selection: The optimizer, considering semantic information,


selects the best execution plan for a given query. This includes deciding
on join order, access methods, and other aspects of query execution.
 Cost Estimation: Semantic optimization includes accurate cost
estimation based on semantic information. This involves estimating the
cost of different query execution plans and selecting the one with the
lowest estimated cost.

04/23/2025 Query processing and Optimization


Cont. …

• Semantic query optimization aims to enhance the


efficiency and performance of database queries by
considering not only the syntactic structure of the
queries but also their underlying semantics.
• This approach can lead to more intelligent and
context-aware optimization decisions in database
systems.
04/23/2025 Query processing and Optimization
Traditional vs Semantic Query
Optimization

04/23/2025 Query processing and Optimization


Cont. …

04/23/2025 Query processing and Optimization


Session Summary

04/23/2025 Query processing and Optimization


Cont. …

04/23/2025 Query processing and Optimization


Cont. …

04/23/2025 Query processing and Optimization

You might also like