The document discusses query optimization techniques used by a query optimizer. It describes steps like converting queries to blocks, translating to relational algebra, applying equivalences, estimating costs and result sizes. Estimating reduction factors is important for cardinality estimation.
The document discusses query optimization techniques used by a query optimizer. It describes steps like converting queries to blocks, translating to relational algebra, applying equivalences, estimating costs and result sizes. Estimating reduction factors is important for cardinality estimation.
The document discusses query optimization techniques used by a query optimizer. It describes steps like converting queries to blocks, translating to relational algebra, applying equivalences, estimating costs and result sizes. Estimating reduction factors is important for cardinality estimation.
The document discusses query optimization techniques used by a query optimizer. It describes steps like converting queries to blocks, translating to relational algebra, applying equivalences, estimating costs and result sizes. Estimating reduction factors is important for cardinality estimation.
School of Computer Engineering, KIIT Deemed to be University
Content
1. Steps for query optimization
2. Translating queries into algebra 3. Relational algebra equivalances 4. Cost estimation of query plan 5. Estimation of result size Query Optimizer • Steps for query optimization - ü Queries are converted into blocks ü Blocks are translated into relational algbra expressions ü Enumerating alternative plans for evaluating these expressions ü Estimating the cost of each plan and choosing the plan with lowest cost Converting into blocks
• SQL queries are optimized by decomposing them
into smaller blocks and then optimizing each block
• A block is an SQL query with no nesting and exactly
one SELECT clause and one FROM clause and at most one WHERE clause, GROUP BY clause, HAVING clause Translating Queries into Algebra Sailors(sid, sname, rating, age) Boats (bid, bname, color) Reserves (sid, bid, day, rname) • For each sailor with the highest rating and at least two reservations for red boats, find the sailor id and the earliest date on which the sailor has a reservation for a red boat. • SELECT S.sid, MIN(R.day) FROM Sailors S, Reserves R, Boats B WHERE S.sid=R.sid AND R.bid=B.bid AND B.color=’Red’ AND S.rating = (SELECT MAX(S2.rating) FROM Sailors S2) GROUP BY S.sid HAVING COUNT(*)>1; Translating Queries into Algebra • Relational algebra expression of first block - SELECT - π(projection) WHERE - σ(selection) FROM - X (cross product)
• The relational algebra expression is represented as σ π ×
expression
• The optimizer finds the best plan for σ − π − × expression. Then
apply GROUP BY clause, HAVING clause. Relational algebra equivalances • An optimizer enumerates plans by applying several equivalances between relational algebra expressions. • Selections • σc1∧c2∧...cn(R) ≡ σc1(σc2(...(σcn(R)))) (Cascade) • σc1(σc2(R)) ≡ σc2(σc1(R)) (Commutative) • Projections • Successively eliminating columns from a relation is equivalent to simply eliminating all but the columns retained by the final projection πa1(R) ≡ πa1(πa2(...(πan(R)))) (Cascade) where ai ⃀ a i+1 Relational algebra equivalances • Cross-Product and Joins • R × S ≡ S × R and R S ≡ S R (Commutative) • R × (S × T) ≡ (R × S) × T and R (S T) ≡ (R S) T (Associative) • When joining several relations, we are free to join the relations in any order we choose Relational algebra equivalances • Select, Project and Join • πa(σc(R)) ≡ σc(πa(R)) (Commute) • R c S ≡ σc(R × S) (join ≡ selection on cross) • If the selection condition involves only attributes of one of the arguments of cross-product or join - σc(R × S) ≡ σc(R) × S and σc(R S) ≡ σc(R) S • A selection can be replaced by a cascade of selections - σc(R × S) ≡ σc1∧c2∧c3(R × S) ≡ σc1(σc2(σc3(R × S))) ≡ σc1(σc2(R) × σc3(S)) where c1 of R,S c2 of R, c3 of S, • πa(R × S) ≡ πa1(R) × πa2(S) (Commute) c1,c2,c3 ⃀ c • πa(R c S) ≡ πa1(R) c πa2(S) a1 is in R, a2 is in S • πa(R c S) ≡ πa(πa1(R) c πa2(S)) a1,a2 ⃀ a or c Cost estimation of query plan • Cost estimation is required for each enumerated plan. • For each node in the tree, we must estimate the cost of performing the corresponding operation. Costs are affected significantly with pipelining or temporary relations • For each node, we must estimate the size of the result and whether it is sorted. This result is the input for the operation of the parent of the current node. • Number of page IOs is used as the unit of cost. Estimation of result size • Size estimation plays an important role in cost estimation as output of one operator can be the input to another operator and the cost of an operator depends on input size. Ex - SELECT attr_list FROM rel_list WHERE term1∧ ..∧termn • The maximum number of tuples in the result of the query is the product of the cardinalities of relations in the FROM clause. Every term of WHERE clause eliminates some of the potential result tuples. • The actual size of the result can be estimated as the maximum size times the product of the reduction factors for the terms in WHERE clause. Computation of reduction factor • column = value: reduction factor can be approximated by 1/NKeys(I) if there is an index I on column for the relation in question. NKeys(I) - no of distinct key values for index I ü If there is no index on column, the System optimizer arbitrarily assumes that the reduction factor is 1 /10 • column1 = column2: reduction factor can be approximated by 1/ MAX(NKeys(I1),NKeys(I2)) if I1 and I2 are the indexes on column1 and column2 respectively. ü If only one of two columns has an index I, reduction factor is 1/NKeys(I) ü If none of the columns has an index, reduction factor is 1/10 Computation of reduction factor cont.. • column > value: Reduction factor is approximated by (High(I)−value) /(High(I)−Low(I)) if there is an index I on column where High(I) - highest value in index I Low(I) - lowest value in index I üIf the column is not of arithmetic type or there is no index, a fraction less than half is chosen • column IN (list of values): reduction factor is the reduction factor for ‘column = value’ multiplied by the number of items in the list Assumption - uniform distribution of values Computation of reduction factor cont..
(M19CST1108) I M. Tech I Semester (R19) Regular Examinations Big Data Analytics Department of Computer Engineering Model Question Paper TIME: 3 Hrs. Max. Marks: 75 M