query_optimization_part1
query_optimization_part1
Instructor:
Peng Xie
Department of
Management
How are SQL Queries Executed?
Typical RDBMS Execution
Step 1, Query
starName Title
- Suppose we have a query
Star1 Movie1
Star1 Movie2
Star2 Movie1
…… ……
Step 2, Parse tree
One column of
star names
Step 3, Generate query plan
One column of
star names
Step 3, Generate query plan
One column of
star names
Step 3, Generate query plan
One column of
star names
Step 4, Optimize query plan
Is better than
?
?
Step 6, Calculate cost and pick the best physical plan
- Several different options that trade between complexity, setup time &
performance
- Implementing Vectorization
Values stored in columnar arrays (e.g. int[]), with a separate bit array to
mark nulls if some recording in a array does not meet condition
Tuple batches fit in L1 or L2 cache
Method 2: Vectorization
- What is a rule?
- Procedure to simplify part of the query based on a pattern
- Example: When I see expr OR TRUE for an expression expr, simplify
this with TRUE
Rule based optimization
- Implementing a rule
- Each rule is typically a function that walks through query plan to
search for its pattern
Rule based optimization
- Implementing a rule
- Rules are often grouped into phases
- E.g., simplify Boolean expressions, pushdown selects, choose join
algorithms, etc.
- Each phase runs rules till they no longer apply
Simple Rules can Work Together to Optimize Complex
Queries
- 𝝈𝒑 R ⨝ S = 𝝈𝒑 R ⨝ S
- The following two queries are equivalent
- 𝜎(𝑠𝑎𝑙𝑎𝑟𝑦>50000) Employee ⨝ 𝑅𝑒𝑔𝑖𝑠𝑡𝑟𝑎𝑡𝑖𝑜𝑛 :
- Select *
- From Employee inner join Registration
- on Employee.EmpID = Registration.EmpID
- where salary > 50000
- Properties for σ + ⨝
- Let p = predicate with only R attributes
- q = predicate with only S attributes
- m = predicate with both R, S attributes
- Note: predicates are the conditions in the where clause
- 𝝈𝒑∧ 𝒒 R ⨝ S
p28 - = 𝝈𝒑 (𝝈𝒒 R ⨝ S )
- = 𝝈𝒑 (R ⨝ 𝝈𝒒 S )
- = 𝝈𝒑 R ⨝ 𝝈𝒒 S
- Properties for 𝜋 + σ
- Let x = subset of R attributes, e.g., R = Employee, x = (EmpID, dept)
- z = attributes in predicate p (subset of R attributes), e,g, z = salary
- Properties for 𝜋 + σ
- Let x = subset of R attributes, e.g., R = Employee, x = (EmpID, dept)
- z = attributes in predicate p (subset of R attributes), e,g, z = salary