0% found this document useful (0 votes)

39 views

Advanced Query Tuning Using IBM Data Studio

- The document discusses using IBM Data Studio to improve SQL query performance through query tuning. - It describes how the DB2 optimizer determines access paths and how developers can focus on aspects like indexes, predicates, filter factors, and statistics to influence the access path. - The key aspects of explaining queries in Data Studio are covered, including generating and viewing access path graphs to analyze query execution.

Uploaded by

Dmitry

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

Advanced Query Tuning Using IBM Data Studio

Uploaded by

Dmitry

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Advanced Query Tuning with IBM Data Studio

Tony Andrews
Themis
Session code: E04
Monday, Nov 05, 2018 (16:30 AM - 17:30) Platform: Cross Platform

1
Objectives
By the end of this presentation, you should:

• Give developers a starting place for performance tuning

• Know how to use Data Studio to help improve query performance.
• Know the different access paths and understand how they are
presented
• Understand filter factors
• Better understand how the DB2 optimizer determines access paths
• Better understand how to use and navigate Data Studio for SQL
tuning

2
Improving SQL Performance
• System Tuning

• Change the SQL

• Gather / Alter Statistics

• Change Physical Design

Improving SQL performance can be done in one of at least 4 ways. System tuning may be done to adjust
the parameters under which the DB2 subsystem operates to effectively match the workload. Altering
system parameters, tuning temporary space, and adjusting bufferpool sizes and thresholds are all examples
of this type of tuning. An appropriately tuned system can affect an improvement in performance. Most of
the time, however, other factors dominate a tuning scenario. The SQL itself must be written in a way that
may be processed efficiently by the database. An appropriate level of statistics about the data must be
gathered to tell the optimizer about the nature of the data being accessed. Lastly, the way the physical
objects are defined must be aligned with the types of queries that are to be performed.

3
Developers Should Focus On

• Appropriate use of indexes • Clustering order of data

• Predicate Types • Knowing ‘why’ any table space scan
• Access Path Choice • Stage 1 Predicates / Stage 2 / Residual
• Filter Factors • Minimal Sorts
• Known Statistics • Possible Rewrites
• Data Outliers

Developers should focus on all of these areas when tuning a query.

No amount of system tuning, however, can recover the resources wasted by a poor database design or
poor access paths generated by the DB2 optimizer. This presentation focuses on optimization and tuning
at the SQL level. In general, we want the optimizer to generate an access path that eliminates as much
data from consideration as early as possible in the process, takes advantage of indexes, and has up to date
statistics to help its access path choices..

Creating appropriate indexes on columns or groups of columns that are commonly used to identify needed
data can significantly reduce the I/O and CPU needed to retrieve a result.

Paying attention to well written predicates are important (Stage 1 vs Stage 2)

Paying attention to any filter factors that are not close to the reality of data being processed.

Knowing the clustering order of the tables is so important. Having tables clustered by primary keys is not
always what we want for applications. How is the table queried? What are the SQL statements and joins
to a table?

Look at the queries that gather medium to large result sets, not OLTP queries to help determine.

4
Optimizer
Catalog Statistics
Object Definitions

Access Path
Rid Pools, Sort Pools/Heap
CPU Speed, # CPUs
Access Path Hint

Through the Data Manipulation Language (DML) the user of a DB2 database supplies the “WHAT”; that
is, the data that is needed from the database to satisfy the business requirements. DB2 then uses the
information in the DB2 Catalog to resolve “WHERE” the data resides. The DB2 Optimizer is then
responsible for determining the all important “HOW” to access the data most efficiently.

Ideally, the user of a relational database is not concerned with how the system accesses data. This is
probably true for an end user of DB2, who writes SQL queries quickly for one-time or occasional use. It
is less true for developers who write application pro-grams and transactions, some of which will be
executed thou-sands of times a day. For these cases, some attention to DB2 access methods can
significantly improve performance.
DB2’s access paths can be influenced in four ways:

♦ By rewriting a query in a more efficient form.

♦ By creating, altering, or dropping indexes.
♦ By updating the catalog statistics that DB2 uses to estimate access costs.
♦ By utilizing Optimizer Hints.

As you can see there are many variables that can affect an access path choice outside of the actual SQL
code. It is always best to have a test environment that looks like production (at least data statistics wise)
so when queries and packages are eventually promoted to production, there are no last minute
surprise/different access paths.

5
Explain
EXPLAIN PLAN SET QUERYNO = 10 FOR
SELECT LASTNAME,SALARY
FROM EMP
WHERE EMPNO BETWEEN '000000' AND '099999'
Optimizer
AND SALARY < 40000

BIND PACKAGE with option

EXPLAIN(YES) z/OS LUW
PLAN_TABLE EXPLAIN_STATEMENT
DSN_STATEMNT_TABLE EXPLAIN_PREDICATE
DSN_FUNCTION_TABLE & a bunch of “other” tables
& a bunch of “other” tables

The process of asking the DB2 optimizer to describe an access path that was chosen (or will be chosen)
for a query is called an explain. When we run an explain, the output is placed in DB2 tables that we may
then view.

A PLAN_TABLE is a regular DB2 table that holds results of an EXPLAIN. IBM’s SQL Reference Guide
contains a format for the PLAN TABLE and a description of all the columns. Each user running an
explain needs access to a plan table either by owning one directly or through a secondary authid. In DB2
Version 8, aliases may also be used to allow users to share a single set of explain tables.

Although the plan table is required to run explains, there are also several other explain tables which may
optionally be created to hold explain data. These extra tables will be populated during an explain if they
exist.

The DSN_STATEMNT_TABLE contains information about the total perceived cost of the query being
explained. This cost data may be compared for several iterations of refinement for a query to see if it
might improve the performance. Costs may also be tracked over time.

Additionally, as of V8 there are many more explain tables that will be populated if they are created. IBM
has recently documented the contents of some of these extra tables. Their primary purpose is to supply
additional information for optimization products like V8 Visual Explain, V9 Optimization Service Center
(OSC) and Data Studio, and V10 Data Studio.

6
Data Studio Explaining Queries
Open Visual
Explain

Select Subsystem
Connection

You can run your query from any database connection. Select Configuration for which sub system you
want to execute against. You
must be connected to the sub system. In this example 2 sub systems are currently connected to.

You type in your query and click on the ‘Open Visual Explain’ button (top right)

The Visual Explain gets generated (bottom right)

To expand the Visual Explain, click the maximize

7
z/OS Data Studio Access Path Graphs

The access path graph shows the explain data for a query. Access path graphs are read left to right,
bottom to top. Each of the nodes on the graph represent a source of data or an operation on data as it
moves towards the result set. Each node may be clicked to provide details about that node on the left side
of the screen.

8
LUW Data Studio Access Path Graphs

The is the same query except explained in LUW. Notice:

Costing is in timerons: This is some proprietary algorithm that contains some weighing factors between
I/O and CPU.

Bottom numbers in the visual explain nodes are timeron costings. Top numbers are cardinality numbers.
For example the IXSCAN
node shows 3 numbers.
- 13.56 for the timeron cost
- 1.085 estimated cardinality of RIDs that meet the criteria
- 2.00 cumulative I/O timeron cost

9
Data Studio Access Path Graphs

Sources of Data

In this graph, there are two nodes that indicate they are a source of data. Data is accessed by doing an
index scan of the XEMP03 index. Once appropriate rows are identified then the data in the EMP table is
retrieved using the identifiers from the index.

If one of these nodes is highlighted by clicking on it the catalog data about the object is displayed on the
left side of the screen. Statistical information is displayed as well as the timestamp when statistics were
last gathered when connected to a Db2 z/OS subsystem.

In Db2 LUW, there are different numbers that show within the nodes. For IXSCAN and FETCH nodes:
Costing is in timerons:
This is some proprietary algorithm that contains some weighing factors between I/O and CPU.
Bottom numbers in the visual explain nodes are timeron costings. Top numbers are cardinality numbers.
For example the IXSCAN
node for this same query in LUW shows 3 numbers.
- 13.56 for the timeron cost
- 1.085 estimated cardinality of RIDs that meet the criteria
- 2.00 cumulative I/O timeron cost

By navigating the tree at the top left of the screen, information may be viewed about the table, tablespace
and any other indexes that exist on the referenced table.

10
Data Studio Access Path Graphs

If one of these nodes is highlighted by clicking on it, the catalog data about the object is displayed on the
left side of the screen. Statistical information is displayed as well as the timestamp when statistics were
last gathered.

By navigating the tree at the top left of the screen, information may be viewed about the table, tablespace
and any other indexes that exist on the referenced table.

11
Data Studio Access Path Graphs

Data Retrieval
Operations

The other nodes on the diagram represent operations on the data. If one of these nodes is selected the left
side of the screen will show the metrics that DB2 used in determining that this was the appropriate access
method. Predicate level data is shown as well as row estimates for how many rows will be passed to the
next operation. These estimates may then be compared to reality to determine if the optimizer made a
good choice.

Where do predicates get applied?

1 At the index level (IXSCAN) index predicates
2 At the data level (FETCH) data predicages

12
z/OS Stage 1, Stage 2 Predicates

Predicates in z/OS get pit into 2 categories, with stage 2 predicates on average about 10% more expensive
to evaluate. Many stage 2 predicates can be rewritten as stage 1 predicates, some cannot. It is always
important to know if there exists any stage 2 predicates within the query.

Db2 LUW does not have stage 2 predicates. They do however have residual predicates, being the most
expensive of predicates from this platform. Predicate types for LUW:

1) Range delimiting predicates: The start and stop key values in an index search. Evaluated by the index
manager.
2) Index sargable: Other index column predicates. These predicates are also evaluated by the index
manager.
3) Data sargable: Predicates applied not at the index level by the index manager, but evaluated at the data
level through the Data Management Services (DMS). Similar to z/OS stage 1.
4) Residual predicates: Predicates applied last and sometime requiring additional I/O (such as large
objects). They also are predicates that often times can be rewritten more efficiently. For example:
Subqueries with ANY, ALL, SOME, or IN). These predicates are evaluated by Relational Data Services
(RDS) and are the most expensive of the four categories of predicates. Similar to z/OS stage 2.

13
z/OS Stage 2 Predicates Rewrites

value NOT BETWEEN COL1 AND COL2

value BETWEEN col expr and col expr
COL op ANY
COL op ALL

There are far fewer stage 2 predicates than in previous releases of Db2. Here a few of the many that
remain that can be controlled and rewritten by developers.

The following will be rewritten by the optmizer:

Where deptno not between value and value

14
LUW Predicates – Sargeable (similar to S1)

Db2 LUW does not have stage 2 predicates. They do however have residual predicates, being the most
expensive of predicates from this platform. Predicate types for LUW:

Notice here that the HIREDATE – 7 DAYS is a stage 2 predicate in z/OS, yet Sargable in LUW.

15
Data Studio Access Path Graphs

Where do the filter factors come from ?

Do any frequency values exists for LASTNAME data?
Is it defaulting to 1/column cardinality?
FIRSTNME has 260 values. 1/260 = .0038

This screen shows filter factors for the index predicates. How did the optimizer come up with .0024 for
the LASTNAME column?

In the statistics, it shows there are 947 different LASTNAME values. This query is looking for 1 named
‘Smith’. Since LASTNAME is the first column of an index, it automatically calculates the top 10 names
in the data. The value ‘Smith’ is one of them and has .0024 percent of the data.

For FIRSTNME = ‘Joe’, all Db2 knows is that there are 260 different first names, and the query is
looking for 1 named ‘Joe’. Without any further information of specific first name values, it assumes they
are all evenly distributed, and takes 1/260 = .0038.

This 1/cardinality is common when using host variables and/or parameter markers ‘?’.

16
Data Studio Access Path Graphs

A great thing about clicking on the table node is the ability to see everything about the table:
-Table level statistics and object information
-Column attributes and statistics
-Indexes and their columns
-Tablespace attributes
- Partitioning Information
-Any special statistics like (Freqval, Histogram, Colgroup)

17
Tablespace Scan
SELECT EMPNO, LASTNAME, SALARY
FROM EMP
WHERE EMPNO BETWEEN ‘000000’ AND ‘099999’
AND SALARY < 40000
PLAN METHOD TNAME ACCESS MATCH ACCESS INDEX PREFETCH
NO TYPE COLS NAME ONLY
1 0 EMP R 0 N S

Data Studio
Estimated number
of rows
Plan Table

Tablespace scans are illustrated through EXPLAIN by ACCESSTYPE = R and PREFETCH = S.

The first query on the previous page illustrates a tablespace scan. The query has a predicate, however
there are no matching indexes on the HIREDATE column.
Through the PLAN_TABLE output, there is no way to differentiate a segmented from a non-segmented
tablespace scan. A table-space must have been created as segmented in order to benefit from the
performance advantages. To change from non-segmented to segmented after a table has been created, it
must be dropped and recreated.

When Tablespace Scans are Appropriate

Tablespace scan access is selected by DB2 typically when;
♦ A matching index scan is not possible because there are no indexes or there are no predicates to that
match the index columns.
♦ A high percentage of the rows in the table qualify.
♦ The indexes that have matching predicates have low cluster ratios, making them efficient only when a
small number of rows qualify.
♦ The table is small, or the statistics say the table is small

The Visual Explain access path graph for a tablespace scan is also shown. Notice that Visual Explain
gives estimates of how many rows will remain at each level.

18
Why Would the Optimizer Choose a Table Space Scan?
1. Are any predicate(s) poorly coded in a non-indexable way that takes away any possible
index choices from the optimizer?
2. Do the predicates in the query not match any available indexes on the table? Know your
indexes on a table!
3. The table could be small and Db2 decides a table scan may be faster than index processing.
4. The catalog statistics could say the table is small. This is more common in test
environments where the Runstats utility is not executed very often.
5. Are the predicates such that Db2 thinks the query is going to retrieve a large enough
amount of data that would require a table scan? Some explain tools will show the number
of rows Db2 thinks will be returned in the execution of a query (the IBM Data Studio tool is
very good at this).

Developers should:
- Know their data (cardinalities, columns with uneven distributions of data, etc.)
- Know the indexes on tables involved in their queries
- Know the clustering order of data in their tables
- Know the partitioning of data in their tables

19
Why Would the Optimizer Choose a Table Space Scan?
6. Are the predicates such that Db2 picks a non-clustered index, and the rows needed are
scattered throughout the table file such that the number of data pages to retrieve is high
enough based on total number of pages in the table to require a table scan? Know how
data is physically clustered in the tablespace!
7. Are the tablespace files or index files physically out of shape and
need a REORG?
8, Are there no predicates? So the query wants all the rows.
9. Sometimes there are just too many conditions in the logic to return the
results needed any other way. This is quite typical with many predicates
that are OR’d together.

20
Some predicates get rewritten for indexability
1 WHERE DEPTNO NOT BETWEEN ‘C01’ and ‘D11’ gets rewritten as
WHERE DEPTNO < ‘C01’ or DEPTNO > ‘D11’

2 WHERE SUBSTR(DEPTNO,1,1) = ‘C’ gets rewriten as

WHERE DEPNO >= ‘C..’ and DEPTNO <= ‘C..’

3) WHERE YEAR(HIREDATE) = 1990 gets rewritten as

WHERE HIREDATE BEWEEN ‘1990-01-01’ and ‘1990-12-31’

4) WHERE ‘1990-01-01 BETWEEN BIRTHDATE and HIREDATE get rewritten as

WHERE BIRTHDATE <= ‘1990-01-01’ and HIREDATE >= ‘1990-01-01’

5) Others…

21
z/OS Index Scan - Matching SELECT * FROM EMP
WHERE LASTNAME = ?
AND FIRSTNME = ‘Nichelle’;
PLAN_TABLE
PLAN METHOD TNAME ACCESS MATCH ACCESS INDEX PREFETCH
NO TYPE COLS NAME ONLY
1 0 EMP I 2 XEMP03 N

Notice FF of
.00105 for
LASTNAME
Predicate.
1/947 = .00105

Matching Index scans are depicted in the PLAN_TABLE by ACCESSTYPE = I, I1, N, or MX and
MATCHCOLS > 0.

For a Matching Index Scan, DB2 has determined that the query uses predicates that match index columns.
In general, the matching predicates on the leading index columns are equal or IN predicates. The predicate
that matches the final index column can be an equal, IN, or a range predicate (<, <=, >, >=, LIKE, or BE-
TWEEN).

The query on the previous page illustrates matching index access. Assume the table EMP has an index;
XEMP03 on (LASTNAME, FIRSTNME, MIDINIT). The index XEMP03 is the chosen access path for
this query, with MATCHCOLS = 2. There are two equal predicates on the first two columns of the index.

In Visual Explain the IXSCAN detail shows which predicates were used to match columns along with
their filter factors. Row estimates are computed and displayed based on the available statistics for the
table and index. This slide only shows the first matching predicate.

The value of MATCHCOLS is used to determine the number of columns DB2 can match to predicates in
the query. Typically, index access will be more efficient the greater the number of matching columns.

Effort placed on proper index design can have a huge return on investment in terms of the efficiency of
DB2’s ability to utilize matching indexes to query predicates.

22
z/OS Index Screening INDEX XEMP03 on
(LASTNAME, FIRSTNME, MIDINIT)
SELECT * FROM EMP
WHERE LASTNAME = ‘Coldsmith’
AND MIDINIT = ‘R’;

Index Screening
Predicate

PLAN_TABLE
PLAN METHOD TNAME ACCESS MATCH ACCESS INDEX PREFETCH
NO TYPE COLS NAME ONLY

1 0 EMP I 1 XEMP03 N

▪ Index Screening

Index screening predicates are specified on index key columns, but are not part of the matching columns
used to scan the index structure. These screening predicates improve index access by reducing the number
of rows that qualify while searching the index.

Assume the table EMP has an index; XEMP03 on (LASTNAME, FIRSTNME, MIDINIT);

The query on the previous page illustrates DB2s ability to use one of the two predicates matching against
the index, i.e. with MATCHCOLS = 1. Once DB2 determines that a symbolic key entry matches on the
predicate LASTNAME = ?, the predicate MIDINIT = ? can be applied during the index scan to further
qualify rows. This is the process known as index screening. If a row meets the criteria of these screening
predicates, the row will be retrieved. Once the data row has been retrieved, predicates for columns not in
the index can be applied.

▪ When Index Screening is used

♦ When there are predicates available to apply against columns in the index to further qualify rows.
♦ The PLAN_TABLE does not directly tell when an index is screened. However, if the MATCHCOLS is
less than the number of in-dex key columns, this indicates index screening is possible. Visual Explain
does flag predicates where index screening is used.

23
z/OS Index Screening (cont)

24
LUW Index Scan - Matching (Start/Stop Keys)
SELECT * FROM EMP
WHERE EMPNO BETWEEN ‘000000’ and ‘099999’

You know when you have matching index access by seeing start and stop keys in the index predicate
information

25
LUW Index Screening
SELECT * FROM EMP
WHERE LASTNAME = ‘Smith’
AND MIDINIT = ‘R’

Index screening in LUW shows a Start position (matching of 1 in this example) and then the other
index predicate as a Sargable predicate.

26
z/OS Index Scan - Nonmatching
SELECT * FROM EMP
WHERE FIRSTNME = ‘Michelle’
AND MIDINIT = ‘R’;

PLAN_TABLE
PLAN METHOD TNAME ACCESS MATCH ACCESS INDEX PREFETCH
NO TYPE COLS NAME ONLY

1 0 EMP I 0 XEMP03 N

Nonmatching Index scans are described through EXPLAIN by ACCESSTYPE = I and MATCHCOLS = 0.

In Visual Explain it is possible to see which columns are used as screening predicates in a nonmatching
index scan. Notice that MATCHCOLS shows up as zero in both the PLAN_TABLE and Visual Explain.

When Nonmatching Index Access is used

Because there is little or no filtering, a nonmatching index scan is used. Sometimes called an index scan:

♦ When index screening is provided. In this case not all the data pages are accessed, but all index pages are
accessed. Then only those data pages that DB2 has determined qualified based on the scanning and
evaluating of the index predicates.

27
z/OS Index Scan - Nonmatching

SELECT * FROM EMP

WHERE FIRSTNME = ‘Michelle’
AND MIDINIT = ‘R’;

Note that on the visual Explain, a node ‘IXSCAN’ shows whether there is matching or not. So it is very
important to always click on the node to evaluate any matching and/or screening.

0 matching should be an indicator for further analysis to determine ‘Why’ no matching.

0 matching means the whole index file will be scanned.

28
LUW Index Scan – Non Matching (Sargable)
SELECT LASTNAME* FROM EMP
WHERE FIRSTNME = ‘David’ and MIDINIT = ‘A’

No Start/Stop

When you do not code the leading column of the index, and DB2 still chooses to use the index, the
other Predicates show up as non starting and stopping keys. In this case SARGEABLE predicates
against the index column(s).

29
Index Only Access

SELECT LASTNAME, FIRSTNME, MIDINIT

FROM EMP
WHERE LASTNAME LIKE 'Jo%'

If all the columns needed for a particular table in a query are available in an index, the optimizer may be
able to qualify and retrieve the columns from the index without going to the tablespace at all. This is
called index only access and may provide a significant performance improvement, particularly when many
rows need to be evaluated using a non-clustered index.

In this example only 1 column in XEMP03 is being used to qualify rows, but placing the FIRSTNME and
MIDINIT columns in the index provides significant benefit to this query.

Note that there is no table node for the EMP table in this diagram since only the index is used.

30
z/OS Nested Loop Join

A nested loop join is described in the PLAN_TABLE through METHOD = 1. An inner join is described
in the PLAN_TABLE through JOIN_TYPE = ‘ ‘. An outer join is described in the PLAN_TABLE
through JOIN_TYPE <> ‘ ‘.

In Visual Explain there will be a join node labeled NLJOIN to describe a nested loop join. In the example
shown the optimizer estimates that only 1 row will qualify from the department table and match to only 1
row on the employee table. This likely explains why this join method was chosen.

31
Nested Loop Join
LUW Nested Loop Join

32
z/OS Merge Scan Join

33
z/OS Hybrid Join

SELECT LASTNAME, PROJNO

FROM EMP E JOIN EMPPROJACT EPA
ON E.EMPNO = EPA.EMPNO
WHERE E.JOB = ‘FIELDREP’

Hybrid join works well when:

•The outer table qualifies many rows
•A non-clustered index exists on the join key for the inner table
•Many rows will be eliminated as a result of the join

34
LUW HASH JOIN

SELECT LASTNAME, PROJNO

FROM EMP E JOIN DEPT D
ON E.DEPTNO = D.DEPTNO
WHERE E.JOB = ‘FIELDREP’

For batch jobs that process large portions of tables, hash joins are often the best choice. Nested loop joins
are good when there are indexable predicates on the inner table and the inner table is clustered in the same
order as the outer table. Otherwise, hash joins can perform much better. This is because there is one pass
of each table and sorts are not needed. A single pass of a table using prefetch is the best you can do if you
need to process the entire table.

DB2 builds a hash table in memory by reading one of the tables (the inner table – the right hand side of
the access path graph. The hash table is a set of memory blocks (called buckets).
▪ Each key is hashed by an algorithm and the key and needed columns are stored in the corresponding
bucket in the hash table.
▪ More than one key can hash to the same bucket. DB2 will maintain a chain of those keys.
The second table (outer or left and side) is read
▪ The keys are hashed and the bucket is searched for a matching key.
▪ For each matching key, DB2 will pass the joined data to the next step in the access path graph.

35
Which Join Method

1) Depends on the predicates

2) How much filtering on the tables
3) Possible indexes
4) Optimization level
5) Clustering of table data

Hybrid join works well when:

•The outer table qualifies many rows
•A non-clustered index exists on the join key for the inner table
•Many rows will be eliminated as a result of the join

36
Sort Activities

Data Sorts RID Sorts

✓ORDER BY
✓ List Prefetch
✓GROUP BY
✓ Multiple Index Access
✓DISTINCT
✓ Hybrid Join
✓UNION
✓Subqueries
✓JOIN

Whenever a sort is seen in the explain output, 2 questions should always come to mind:

1. Is the sort needed in this query

2. If so, how many rows are going into the sort. The more rows and columns selected, the
more expensive the sort.

37
z/OS Data Sorts via Data Studio

Sorts are represented in Visual Explain by red nodes that indicate the reason for the sort and an estimate
of the number of records to be processed by the sort. The number 51834 represents the guesstimated
number of rows going into the sort based on the ‘Where’ criteria.

38
LUW Data Sorts via Data Studio

Sorts are represented in Visual Explain by red nodes that indicate the reason for the sort and an estimate
of the number of records to be processed by the sort.

39
Predicate Generation Through Transitive Closure
The Premise
If A must equal B
And A must be RED,
Then B must also be RED.

40
Predicate Generation Through Transitive Closure Cont’d
Single Table DB2 Generated Predicate

Index XDEPT1 on DEPTNO

Index XDEPT3 on ADMRDEPT

SELECT . . . . SELECT . . . .
FROM DEPT FROM DEPT
WHERE DEPTNO = ADMRDEPT WHERE DEPTNO = ADMRDEPT
AND ADMRDEPT = ‘A00’ ; AND ADMRDEPT = ‘A00’
AND DEPTNO = ‘A00’ ;

XDEPT1 index chosen !

41
Predicate Generation Through Transitive Closure Cont’d
Single Table DB2 Generated Predicate

Index XDEPT1 on DEPTNO

Index XDEPT3 on ADMRDEPT

SELECT . . . . SELECT . . . .
FROM DEPT D, EMP E FROM DEPT D, EMP E
WHERE D.DEPTNO = E.DEPTNO WHERE D.DEPTNO = E.DEPTNO
AND D.DEPTNO IN (‘A00’ AND D.DEPT IN (‘A00’,
‘B01’, ‘B01’,
‘C11’) ; ‘C11’)
AND D.DEPTNO IN (‘A00’,
‘B01’,
‘C11’) ;

42
Predicate Transitive Closure

SELECT . . . .
FROM DEPT
WHERE DEPTNO = ADMRDEPT
AND ADMRDEPT = ‘A00’ ;

Note: Index on
DEPTNO chosen

Visual Explain will show all predicates used by the optimizer, both those included in the query and those
generated by predicate transitive closure.

43
LUW Predicate Transitive Closure

Visual Explain will show all predicates used by the optimizer, both those included in the query and those
generated by predicate transitive closure.

44
Tuning a Query

Tune a Query

In z/OS you have to go into ‘Tune a query’ in order to see the original and transformed SQL.

45
Tuning a Query

Notice some of the options are inaccessible in the free standalone version. Click on the ‘Select What to
Run’to initiate further tuning of the SQL statement.

Also notice settings for:

SQLID - Set to the owner of plan table s for the tool to use
SCHEMA – Set to the owner of tables involved in the query

46
Tuning a Query Output

All the output from ‘Tune a Query’.

Note some are not available in ‘no
charge’ version.

After executing the query tuning, options open up for analysis (Formatted Query, Access Plan Graph,
Summary Report, Statistics Advisors, etc.)

47
Statistics Advisor

Opens Advisor
details tab to view
information

Click on this line to expand

statistics recommendations

Statistics Advisories are noted by priorities. Each can be clicked on to provide more details and specific
recommendations.

48
Statistics Advisor Summary Report

Statistics Advisories are noted by priorities. Each can be clicked on to provide more details
And specific recommendations.

49
z/OS Tune a query – Query Transformation

Open formatted query is one the best features in query tuning. DB2 optimization has always rewritten
(transformed) queries a t times in the past, and now the rewrite (if any) is being externalized.

This is especially important with the V9 Global Query Optimization enhancement.

50
z/OS Tune a query – Query Transformation
Subquery transformation may also takes place in LUW

Note: Note:
Non Correlated Correlated

Open formatted query is one the best features in query tuning. DB2 optimization has always rewritten
(transformed) queries at times in the past, and now the rewrite (if any) is being externalized. But by no
means should we cut and paste the transformed query into the code. This is because along with the
Transformed rewrite are other ‘hidden’ pieces of information that get passed to the access path selection
step. The transformed query does tell the whole tranformation.

This is especially important with the V9 Global Query Optimization enhancement.

51
Case #1: 2 Possible Indexes
SELECT * FROM EMP
WHERE LASTNAME = 'Smith'
AND FIRSTNME = 'Joe'
AND DEPTNO = 'A00';

XEMP02 Col XEMP03 Cols

51,834 * .0092 = 477 rows 51,834 * .001 * .0057 = < 1 row
Winner!

IBM Data Studio can often tell us the ‘Why’ of a certain access path choice. In this example DB2 chose
the XEMP03 index because based on statistics, it was showing the lest amount of rows that met the
criteria. So the explain looks good, but could run better ……

In this example there are two possible indexes that could be used to retrieve the desired result. Index
XEPM02 contains column DEPTNO while index XEMP03 contains a concatenation of LASTNAME,
FIRSTNME and MIDINIT. Db2 could choose to use multiple index access, but this is unlikely unless
neither index provides a high degree of filtering. By computing the filter factors for the 2 available
indexes using the formula for uniform distribution, Db2 determines that about 447 rows will qualify for
the DEPTNO predicate and less than one row will qualify for the combination of lastname and firstname
requested. Since XEMP03 is perceived to narrow the result to one row, it is chosen as the index to be
used for qualifying the rows.

52
Comparing Estimates with Reality
SELECT COUNT(*) 1,289 rows
FROM EMP
WHERE LASTNAME = 'Smith' (Db2 est. <1)
AND FIRSTNME = 'Joe';

SELECT COUNT(*) 4 rows

FROM EMP
WHERE DEPTNO = 'A00'; (Db2 est. 477)

If this query is not performing optimally, it may be useful to compare the filter factors presented in Data
Studio with actual row counts on the tables involved. In this case, the optimizer’s estimate of 1 row for
index XEMP03 was incorrect (1289 rows actually exist). The estimate for XEMP02 of 477 qualifying
rows was also incorrect (actual count was only 4).

In this case a poor filter factor estimate led to the wrong index being selected.

53
Additional Statistics
A00 has far fewer rows than
the average dept
RUNSTATS INDEX (THEMIS82.XEMP02)
KEYCARD FREQVAL NUMCOLS 1 COUNT 5 BOTH

RUNSTATS INDEX (THEMIS82.XEMP03)

KEYCARD FREQVAL NUMCOLS 2 COUNT 5 MOST
Joe Smith has far more rows than the
average name combination

Uneven distribution of data in this table is the reason for the poor filter factor estimates. In this case
department A00 has far fewer occurrences than the average department. Joe Smith also occurred far more
frequently than the average name combination. Additional index statistics may be gathered to inform the
optimizer of the statistical outliers. In the first control card, the runstats utility is being requested to
capture both the 5 most frequently occurring values and the 5 least frequently occurring values for
DEPTNO and store these values in SYSCOLDIST. The second control card is requesting the top 5
combinations for the first two columns of XEMP03 (LASTNAME and FIRSTNME).

54
Frequency Stats

Once these statistics have been gathered they may be viewed using Data Studio in the Coldist folder for
the column statistics. The appropriate filter factor for department A00 is .000077 rather than .0092 as
calculated using uniform distribution rules. This leads Db2 to estimate that about 4 rows will qualify for
this predicate, which is exactly what the actual count revealed.

55
Result of Adding Frequency Stats

Better Index
Chosen

With the additional statistics in place, the optimizer chooses XEMP02 to accomplish the filtering instead
of XEMP03 resulting in better performance. More rows are eliminated from consideration earlier in the
process using this access path.

56
Case #2: Join Order
SELECT *
FROM EMP E, PROJ P
WHERE E.EMPNO = P.EMPNO
AND E.COMM = 0
AND P.PRSTAFF > 3

A similar problem can occur when joining tables together even when the predicates involved are not
indexed. In this example, the EMP table is being joined to the PROJ table. Local predicates exist on both
tables, so Db2 must make what appears to be a narrow decision on which table should be accessed first.

There exist an index on PRSTAFF with some frequency values statistics.

57
Comparing Estimates with Reality
SELECT COUNT(*) # Rows on EMP 51,803 rows
FROM EMP
WHERE COMM = 0; (Db2 est. 1,620)

SELECT COUNT(*) 14,598 rows

FROM PROJ
WHERE PRSTAFF > 3; (Db2 est.
50,242 * .2906 =
14,600)
Filter Factor

The optimizer calculates that 1620 rows will qualify in the EMP table using formulas for uniform
distribution. Actual counts reveal that 51,803 rows actually exist that meet the requested condition. Zero
appears to be a default value for COMM that occurs much more often than any other value. This is a
fairly common condition. Many columns have default or null values that occur much more often than
anything else.

58
Additional Statistics
COMM = 0 appears far more often
than other values appear
RUNSTATS TABLESPACE THEMIS82.TS00EMP
TABLE(THEMIS82.EMP)
COLUMN(COMM)
COLGROUP(COMM) FREQVAL COUNT 1
SORTDEVT SYSDA SORTNUM 4

Runstats needed on non

uniform distribution of
COMM data

The syntax presented here will gather column level statistics for the COMM column and also gather the
most frequently occurring value for COMM (COLCOUNT 1). Note that runstats must sort the data to
obtain this information so sort parameters may need to be included in the syntax.

Also notice that only a count of 1 is needed since there is only 1 value that contains 99% of the data. For
developers in coding predicate logic using this column, they would then need 2 different queries:

If :COMM-HV = 0 then
SELECT …….
FROM EMP
WHERE …….
AND COMM = 0;
ELSE
SELECT …….
FROM EMP
WHERE …….
AND COMM = :HV-COMM
AND COMM <> 0;
END IF;

59
Adjusted Join Order
SELECT *
FROM EMP E, PROJ P
WHERE E.EMPNO = P.EMPNO
AND E.COMM = 0
AND P.PRSTAFF > 3

When the additional statistics are provided, the optimizer understands the true nature of the data on the
EMP table and chooses the PROJ table to be the first accessed since it provides much better filtering.

Again, if different values come in for COMM and they are put into a host variable, then whenever the
value is > 0, the following would need to be executed:

SELECT …….
FROM EMP
WHERE …….
AND COMM = :HV-COMM
AND COMM <> 0;

60
z/OS V11 Sparse Indexing
Example 1
Sparse
SELECT LASTNAME, DEPTNAME Index
FROM EMP E JOIN
DEPT D
ON E.DEPTNO = D.DEPTNO
WHERE E.SALARY > 45000

z/OS sparse index processing is similar to hash joining on other platforms (like DB2 LUW). This is
usually a good thing that the optimizer chooses.

The index is built in memory (called In-Memory-Data-Cache). Could overflow to a work file if the
entries in the sparse index are too many.
Sparse index/hash join can beat out nested loop if the inner table (after any local predicates) can be
contained in-memory (default for MXDTCACH is 20MB), and the join has enough rows from the
outer to inner to "pay back" the build cost of the sparse index/hash.

The benefit of sparse index processing is that each lookup to the inner (using the hash of the key)
happens in RDS (stage 2). Probing the inner table index (even if index only) has higher overhead than
probing the hashed sparse index.

61
z/OS 11 Sparse Indexing
WITH X AS Example 2
SELECT DEPTNO,
AVG(SALARY) AS AVG_SAL,
SUM(SALARY) AS SUM_SAL,
COUNT(*) AS NUM_EMPS
FROM THEMIS81.EMP
GROUP BY DEPTNO)

SELECT E.EMPNO,
E.LASTNAME,
E.DEPTNO,
E.SALARY,
X.AVG_SAL,
X.SUM_SAL
FROM THEMIS81.EMP E,
X
WHERE E.DEPTNO = X.DEPTNO
ORDER BY E.DEPTNO,E.SALARY DESC

z/OS sparse index processing is similar to hash joining on other platforms (like DB2 LUW). This is
usually a good thing that the optimizer chooses.

62
Thank You for Attending!

“There is always time for an Explain”

“I have noticed that when the developers get

educated, good SQL programming standards are in
place, program walkthroughs and Explains are executed
correctly, incident reporting stays low, CPU costs do not get out
of control, and most performance issues are found before
promoting code to production.”

63
Tony Andrews
Themis Inc.
tandrews@themisinc.com

Session code: E04

Please fill out your session

evaluation before leaving!

ABAP On HANA Interview Questions
100% (2)
ABAP On HANA Interview Questions
21 pages
Advanced Logic Synthesis: Multiple Choice Questions
No ratings yet
Advanced Logic Synthesis: Multiple Choice Questions
16 pages
McKinsey & Company - BU Strategic Plan Template Book - 2001
No ratings yet
McKinsey & Company - BU Strategic Plan Template Book - 2001
58 pages
All in One Informatica Questionnaire
75% (4)
All in One Informatica Questionnaire
83 pages
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
From Everand
Oracle Database Administration Interview Questions You'll Most Likely Be Asked: Job Interview Questions Series
Vibrant Publishers
5/5 (1)
Retreats Ora Grodsky and Jeremy Phillips
No ratings yet
Retreats Ora Grodsky and Jeremy Phillips
11 pages
IM Ch11 DB Performance Tuning Ed12
No ratings yet
IM Ch11 DB Performance Tuning Ed12
17 pages
.Ua Articles SQL SQL
No ratings yet
.Ua Articles SQL SQL
11 pages
Module 18 - Database Tuning
No ratings yet
Module 18 - Database Tuning
9 pages
Db2 SQL Tuning Tips
100% (1)
Db2 SQL Tuning Tips
11 pages
Db2 SQL Tuning
No ratings yet
Db2 SQL Tuning
26 pages
Query Processing and Optimization in Oracle RDB: Gennady Antoshenkov, Mohamed Ziauddin
No ratings yet
Query Processing and Optimization in Oracle RDB: Gennady Antoshenkov, Mohamed Ziauddin
9 pages
Oracle Performance Tuning Interview Questions
No ratings yet
Oracle Performance Tuning Interview Questions
7 pages
Informatica Performance Optimization Techniques
No ratings yet
Informatica Performance Optimization Techniques
21 pages
Oracle SQL Tuning PDF
50% (2)
Oracle SQL Tuning PDF
70 pages
bi-unit-4
No ratings yet
bi-unit-4
40 pages
Tuning Hierarchy - The Pyramid: Server/OS Level Tuning
No ratings yet
Tuning Hierarchy - The Pyramid: Server/OS Level Tuning
75 pages
1 Hend 4 F 3 Hru 8 Dfu 504 Un
No ratings yet
1 Hend 4 F 3 Hru 8 Dfu 504 Un
22 pages
Tuning SQL Statements Again
100% (6)
Tuning SQL Statements Again
9 pages
Oracle Database Performance Tuning FAQ
100% (1)
Oracle Database Performance Tuning FAQ
8 pages
SQL Interview Questions and Answers: Search Tech Interviews
100% (1)
SQL Interview Questions and Answers: Search Tech Interviews
6 pages
SQL Performance in ERP 11i
No ratings yet
SQL Performance in ERP 11i
14 pages
Database tuning1
No ratings yet
Database tuning1
5 pages
Five Performance Hints For Efficient SQL
No ratings yet
Five Performance Hints For Efficient SQL
7 pages
Perofrmance and Indexes Discussion Questions Solutions PDF
No ratings yet
Perofrmance and Indexes Discussion Questions Solutions PDF
5 pages
PL SQL Interview Questions
100% (2)
PL SQL Interview Questions
27 pages
Oracle Related Questions
No ratings yet
Oracle Related Questions
33 pages
+1. CA Performance Handbook For DB2 For zOS - CHAPTER 7 - Application Design and Tuning For Performance
No ratings yet
+1. CA Performance Handbook For DB2 For zOS - CHAPTER 7 - Application Design and Tuning For Performance
16 pages
Informatica Interview Questioner-Ambarish
No ratings yet
Informatica Interview Questioner-Ambarish
211 pages
Adl SQL Cheatsheet
100% (1)
Adl SQL Cheatsheet
3 pages
Odbc and SQL: Creating A Channel
No ratings yet
Odbc and SQL: Creating A Channel
9 pages
Database Testing Question & Answers: What Is Way of Writing Testcases For Database Testing?
No ratings yet
Database Testing Question & Answers: What Is Way of Writing Testcases For Database Testing?
3 pages
Query Processing
No ratings yet
Query Processing
20 pages
Database Testing Overiew - 2
No ratings yet
Database Testing Overiew - 2
11 pages
Application Tuning
No ratings yet
Application Tuning
11 pages
Oracle Cost-Based Optimizer Basics
No ratings yet
Oracle Cost-Based Optimizer Basics
52 pages
Department of Computer Science and Engineering: Comprehensive Viva-Voce Database Management System
No ratings yet
Department of Computer Science and Engineering: Comprehensive Viva-Voce Database Management System
9 pages
844 Abap4 Performance Tuning Tips and Tricks
No ratings yet
844 Abap4 Performance Tuning Tips and Tricks
20 pages
Manegemene
No ratings yet
Manegemene
26 pages
DWDM Record
No ratings yet
DWDM Record
83 pages
SQL Sop
No ratings yet
SQL Sop
8 pages
Understanding Performance Tuning in Oracle
100% (2)
Understanding Performance Tuning in Oracle
17 pages
Teradata SQL Tuning Ver 1
100% (2)
Teradata SQL Tuning Ver 1
19 pages
Chapter 9 Fundamentals of An ETL Architecture
No ratings yet
Chapter 9 Fundamentals of An ETL Architecture
8 pages
Query Optimization
No ratings yet
Query Optimization
10 pages
Oracle 10g Lab Simple
No ratings yet
Oracle 10g Lab Simple
93 pages
Oracle Qs
No ratings yet
Oracle Qs
6 pages
Power Bi
No ratings yet
Power Bi
5 pages
Data Visualisation and Reporting Using Tableau
No ratings yet
Data Visualisation and Reporting Using Tableau
52 pages
Oracle SQL Tuning 1230324983128347 2
100% (1)
Oracle SQL Tuning 1230324983128347 2
97 pages
Query Basics UNC
No ratings yet
Query Basics UNC
118 pages
Unit - V: Database Database Management System Storage Devices CPU Computers Network
No ratings yet
Unit - V: Database Database Management System Storage Devices CPU Computers Network
4 pages
Informatica Interview Questions and Answers
No ratings yet
Informatica Interview Questions and Answers
5 pages
Databricks Optimization Technique
No ratings yet
Databricks Optimization Technique
18 pages
Test
No ratings yet
Test
80 pages
SQL Interview Success From Beginner To Pro
From Everand
SQL Interview Success From Beginner To Pro
Shana
No ratings yet
Advanced SQL Performance Tuning: Optimize Your Database Workloads
From Everand
Advanced SQL Performance Tuning: Optimize Your Database Workloads
Robert Johnson
No ratings yet
Learn SQL: Database Management Basics
From Everand
Learn SQL: Database Management Basics
Kiet Huynh
No ratings yet
SAS Interview Questions You'll Most Likely Be Asked
From Everand
SAS Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Oracle 11g Streams Implementer's Guide
From Everand
Oracle 11g Streams Implementer's Guide
Ann L. R. McKinnell
No ratings yet
SQL All-in-One For Dummies
From Everand
SQL All-in-One For Dummies
Allen G. Taylor
4/5 (1)
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
DBMS Lab Manual
From Everand
DBMS Lab Manual
Jitendra Patel
1.5/5 (3)
Dokumen - Tips - Extending Ibm Tivoli Identity Manager Ibm Tivoli Identity Manager 46 Using Javascript
No ratings yet
Dokumen - Tips - Extending Ibm Tivoli Identity Manager Ibm Tivoli Identity Manager 46 Using Javascript
31 pages
Stuc2150 609
No ratings yet
Stuc2150 609
77 pages
SIM 7.0 Lab Guide 1.01 20150609
No ratings yet
SIM 7.0 Lab Guide 1.01 20150609
96 pages
Isam Mmfa Cookbook 20181214
No ratings yet
Isam Mmfa Cookbook 20181214
250 pages
WAS31 WAS32 Web Sphere Performance Dataand Tools
No ratings yet
WAS31 WAS32 Web Sphere Performance Dataand Tools
39 pages
ISAM AppX Basic Appliance Cookbook
No ratings yet
ISAM AppX Basic Appliance Cookbook
46 pages
C6 - PureApp in Production - HADR, Monitoring, Enterprise Security
No ratings yet
C6 - PureApp in Production - HADR, Monitoring, Enterprise Security
42 pages
Welcome To Saurashtra University Naimish Sem 4 Bcom
No ratings yet
Welcome To Saurashtra University Naimish Sem 4 Bcom
1 page
Low Salicylates
No ratings yet
Low Salicylates
12 pages
Learning Episode 8
No ratings yet
Learning Episode 8
3 pages
AMUL
No ratings yet
AMUL
19 pages
Class 3 - Normal Distribution
No ratings yet
Class 3 - Normal Distribution
20 pages
A SAFE APPROACH TO HOISTING
No ratings yet
A SAFE APPROACH TO HOISTING
14 pages
(6C) Aircraft Structures
No ratings yet
(6C) Aircraft Structures
71 pages
Pressure Regulator Principles
0% (1)
Pressure Regulator Principles
7 pages
2013 Course Structure BTech CSE
No ratings yet
2013 Course Structure BTech CSE
32 pages
Makalah Inggris
No ratings yet
Makalah Inggris
19 pages
Progressinair Jetspinning
No ratings yet
Progressinair Jetspinning
40 pages
Assignment No 3 Review
No ratings yet
Assignment No 3 Review
9 pages
Jose Daniel G. Asuncion: 2FB Dapitan Regency Dapitan St. Sampaloc, Manila 20:00
No ratings yet
Jose Daniel G. Asuncion: 2FB Dapitan Regency Dapitan St. Sampaloc, Manila 20:00
3 pages
Waiver 2023
No ratings yet
Waiver 2023
1 page
Estimate S
No ratings yet
Estimate S
28 pages
WWW Who Int/workforcealliance/knowledge/toolkit/33 PDF
No ratings yet
WWW Who Int/workforcealliance/knowledge/toolkit/33 PDF
1 page
CBIP Paper-Transformer Standardization - NTPC
No ratings yet
CBIP Paper-Transformer Standardization - NTPC
4 pages
Proposal of Methylamine 99
No ratings yet
Proposal of Methylamine 99
20 pages
Untitled2.ipynb - Colab-Exp2
No ratings yet
Untitled2.ipynb - Colab-Exp2
2 pages
Before The Lights Go Out: A Survey of EMP Preparedness Reveals Significant Shortfalls
No ratings yet
Before The Lights Go Out: A Survey of EMP Preparedness Reveals Significant Shortfalls
15 pages
Stress Strain Diagram
No ratings yet
Stress Strain Diagram
8 pages
Animation in China History, Aesthetics, Media
No ratings yet
Animation in China History, Aesthetics, Media
273 pages
Engine Misfire
No ratings yet
Engine Misfire
6 pages
Scriptographer User Guide
No ratings yet
Scriptographer User Guide
7 pages
Jurnal Marketing Public Relation Upi
No ratings yet
Jurnal Marketing Public Relation Upi
16 pages
Practical Obstetrics and Gynaecology Handbook for O G Clinicians and General Practitioners 2nd Edition Thiam Chye Tan - Read the ebook online or download it to own the full content
No ratings yet
Practical Obstetrics and Gynaecology Handbook for O G Clinicians and General Practitioners 2nd Edition Thiam Chye Tan - Read the ebook online or download it to own the full content
76 pages
School Police 1
No ratings yet
School Police 1
5 pages