Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
8 views

DE_Module5_QueryOptimization

The document provides an overview of the internal workings of Relational Database Management Systems (RDBMS), focusing on database structures, query processing, and optimization techniques. It explains the steps involved in query processing, including parsing, optimization, and evaluation, while detailing various operations and rules for efficient query execution. Additionally, it discusses cost-based and heuristic optimization algorithms, highlighting their importance in improving query performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

DE_Module5_QueryOptimization

The document provides an overview of the internal workings of Relational Database Management Systems (RDBMS), focusing on database structures, query processing, and optimization techniques. It explains the steps involved in query processing, including parsing, optimization, and evaluation, while detailing various operations and rules for efficient query execution. Additionally, it discusses cost-based and heuristic optimization algorithms, highlighting their importance in improving query performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Module – V

INTERNALS OF RDBMS
Introduction:
A database (DB) is a collection of homogeneous sets of data, with relationships defined among them, stored in a
permanent memory and used by means of a DBMS, a piece of software that provides the following key features:
 A language for the database schema definition, the restrictions on allowable values of the data (integrity
constraints), and the relationships among data sets.
 The data structures for the storage and efficient retrieval of large amounts of data in permanent memory.
 A language to allow authorized users to store and manipulate data
 A transactions mechanism to protect data from hardware and software malfunctions and unwanted interference
during concurrent access by multiple users.
Terminologies:
o Query: A query is a request for information from a database.
o Query Plans: A query plan (or query execution plan) is an ordered set of steps used to access data in a
SQL relational database management system.
o Query Optimization:
 A single query can be executed through different algorithms or re-written in different forms and
structures.
 The query optimizer attempts to determine the most efficient way to execute a given query by
considering the possible query plans.
 The goal of query optimization is to reduce the system resources required to fulfill a query, and
ultimately provide the user with the correct result set faster.
Query Processing:
 Query processing refers to activities including translation of high level language(HLL) queries into operations at
physical file level, query optimization transformations, and actual evaluation of queries.
A query expressed in a high-level query language such as SQL must first be scanned, parsed, and validated.

The steps involved in processing a query


 Parsing and translation
 Optimization
 Evaluation

Parsing and translation:


 First the given query is translated into its internal form.

1
 The parser checks the syntax of the user’s query, verifies the relation names appearing in the
query etc.
 The system constructs a parse-tree representation of the query, which it then translates into a
relational-algebra expression.
Optimization:
A relational algebra expression may have many equivalent expressions.
 Example:
select balance from account where balance <2500
 The relational algebra form is:
Π balance ( balance<2500 (account))

balance<2500 (Π balance (account))

 Each relational algebra operation can be evaluated using one of several different algorithms.
 A sequence of primitive operations that can be used to evaluate a query is a query execution
plan or query-evaluation plan.

 An index (denoted in the figure as “index 1”) on balance has been used for the selection
operation in order to find accounts with balance<2500.
 Amongst all equivalent evaluation plans , the one with lowest cost is choosen.
 Cost is estimated using statistical information such as number of tuples in each relation, size of
tuples etc from the database catalog.
Evaluation:
The query-execution engine takes a query-evaluation plan, executes that plan, and returns the answers
to the query.

2
Query Optimization
Optimization refers to the best of all possible options, but query optimization doesn’t consider all possible
option, so it is a query improvement. Query optimization is a function of many RDBMS in which multiple
query plans are examined & a good query plan is identified. The approaches are:
i. Reporting the query in a more effective manner and
ii. Estimating the cost of various execution strategies for the query.
The system first translates the query into its internal form. Then optimization begins, by finding an equivalent
expression that is more efficient and then selects a detailed strategy for processing the query. The final choice of
a strategy is based on the number of disk accesses required.

Equivalence Expression
 The first step is to find a relational algebra expression that is equivalent to the given query and is
efficient to execute.
 Two relational algebra expressions are said to be equivalent if the two expressions generate the same
set of tuples on every legal database instance.
 The first step is to find a relational algebra expression that is equivalent to the given query and is
efficient to execute.

3
i. Selection Operation
Rules for optimization are
a. Perform select operation as early as possible.
b. Conjunctive selection operations can be deconstructed into a sequence of individual selections. This is
called a sigma-cascade.

 P1 ( e ) by  P1 ( P2 ( e ))
P2
Where P1 , P2 are predicates and e is relational algebra expression.
 P1 P2 ( e ) =  P1 ( P2 ( e )) =  P2 ( P1 ( e ))

 Selection operations are commutative.

ii. Project Operation


Projections reduces the size of relations , so the rule is
- Apply projections early.
- Only the last in a sequence of projection operations is needed, the others can
be omitted. This is called a pi-cascade.

Selections can be combined with Cartesian products and theta joins.

iii. Natural Join Operation


Rules for conversion are

4
a. Choose an optimal ordering of the natural join operation
Since natural join is associative
( R1 ⋈ R2 ) ⋈ R3 = R1 ⋈ (R2 ⋈ R3) (but the computation may differ)

Similarly as natural join is commutative


( R1 ⋈ R2 ) = ( R2 ⋈ R1 )
b. Choose an optimal ordering of the theta join operation
Since theta join is associative

Similarly as theta join is commutative

iv. Union and Intersection are commutative.


( R1 U R2) = ( R2 U R1)
( R1 ∩ R2) = ( R2 ∩ R1)

v. Union and Intersection are associative.


( R1 U R2) U R3 = R1 U (R2 U R3)
( R1 ∩ R2) ∩ R3 = R1 ∩ (R2 ∩ R3)

vi. Other Operations


Selection operation distributes over the union, intersection, and difference operations.
a.  P ( R1 U R2) = P ( R1) U P ( R2)
b.  P ( R1 - R2) = P ( R1) - P ( R2)
c.  P ( R1 ∩ R2) = P ( R1) ∩ P ( R2)

vii. Projection operation distributes over the union operation.


 L ( R1 U R2) =  L (R1) U  L (R2)

 A1 , A2 ( c ( R)) =  c ( A1 , A2 (R)), if C involves only A1 and A2.

5
Heuristic Rule:
The heuristic rule is to apply select (  ) and project () operations before applying the join (⋈) or other binary
operations.

Example:
instructor(ID, name, dept_name, salary)
teaches(ID, course_id, sec_id, semester, year)
course(course_id, title, dept_name, credits)

Query 1: Find the names of all instructors in the “Physics” department, along with the titles of the
courses that they teach.

Optimized Query:

Query 2: Find the names of all instructors in the “CSE” department who have taught a course in 2009, along
with the titles of the courses that they taught.

Optimized Query:
By using “join associativity” and then the rule of applying “perform selection early”
Π name, title (σ dept_name=”CSE” (instructor) ⋈ σ year=2019 (teaches)

Query tree
 It is a tree data structure that corresponds to a relational algebra expression.
 It represents the input relation of the query as leaf nodes and relational algebra operations as
intermediate nodes.

6
 Execution consists of executing an internal node operation whenever its operations are available and
then replacing that node with the result relation.
 The heuristic optimizer transforms the initial query tree into a final query tree that is efficient.
 It applies the rules for equivalence on the initial tree.

Examples of Transformations
Branch-schema = (branch-name, branch-city, assets)
Account-schema = (account-number, branch-name, balance)
Depositor-schema = (customer-name, account-number)

Query1: Display names of customers having account in “BBSR” city.


Π customer-name (σ branch-city=”BBS R” (branch ⋈ (account ⋈ depositor)))
Optimized Query:
Π customer-name ((σ branch-city=”BBS R” (branch)) ⋈ (account ⋈ depositor))

Query 2: Display names of customers having account in “BBSR” city and balance more than 1000.
Π customer-name (σ branch-city=”BBS R” ᴧ balance > 1000 (branch ⋈ (account ⋈ depositor)))
Optimized Query:
Π customer-name (σ branch-city=”BBS R” (branch) ⋈ σ balance > 1000 (account) ⋈ depositor)))

7
Query optimization algorithms:
 Several different algorithms can be used for each relational operation, giving rise to alternative
evaluation plans.
 Hash join is best algorithm when large, unsorted, and non-indexed data (residing in tables) is to be
joined.
 In case no other join is preferred (maybe due to no sorting or indexing etc), then, Hash join is used.
 If both join inputs are large and the two inputs are of similar sizes, a merge join with prior sorting
gives better result.

8
Evaluation of an expression containing multiple operations:
 An expression with multiple operations can be evaluated broadly in two different ways:
materialized view and pipelining.
 A materialized view is a view whose contents are computed from the definition and stored
whenever required .
 The result of each evaluation is materialized in a temporary relation for subsequent use.
 A disadvantage to this approach is the need to construct the temporary relations, which must be
written to disk.
 An alternative approach is to evaluate several operations simultaneously in a pipeline, with the
results of one operation passed on to the next, without the need to store a temporary relation.
 Pipeline is the approach of sending the output of a computation as the input to the next
computation.

Practical query optimizers incorporate elements of the following two broad algorithms:
 Search all the plans, and chooses the best plan in a cost-based fashion.
 Uses heuristics to choose a plan.

Cost-Based Optimization Algorithm:


 A cost-based optimizer generates a range of query-evaluation plans from the given query by
using the equivalence rules, and chooses the one with the least cost.
Example:
 Suppose we want to find the best join order for

 If n=3 then 12 join orderings can be formed as below

 Here we have to find the cost for all possible join orders to find the best join order.
 There are (2(n − 1))!/(n − 1)! different join orders for ‘n ‘ number of relations.
 For n=7 ,the number is 665280. So it is very difficult to find the cost of all possible orders.
 Without generating the cost of all possible join orders using dynamic programming, the least-
cost join order for any subset of {r1, r2, ….r n } is computed only once and stored for future use.

9
 The time complexity of dynamic programming is O(3n ) and space complexity is O(2n ).
 Cost based optimization is expensive, but worthwhile for queries on large datasets.

Heuristic Optimization Algorithm:


 Cost based optimization algorithm is expensive even with dynamic programming.
 Systems may use heuristics to reduce the number of choices that must be made in a cost-based
fashion.
 Heuristic optimization transforms the query tree by using a set of rules that typically improves
execution performance.
o Perform selection early( Reduces the number of tuples)
o Perform projection early( Reduces the number of attributes)
o Perform most restrictive selection and join operations (i.e with smallest result size) before other
similar operations.
o Some systems use only heuristics algorithm whereas others combine heuristics with partial cost
based optimization.
o The heuristic rule is to apply select and project operations before applying the join (⋈) or
other binary operations.

Example:
Let emp(name, age, sal, dno)
dept(dno, dname, floor, mgr, ano)
Question: Display the name and departmental floor of employees getting salary more than 100k.
Ans: Select name, floor from emp, dept where emp.dno = dept.dno and sal > 100k.

10

Example 2:

11

You might also like