db2 Perf Tune 115
db2 Perf Tune 115
db2 Perf Tune 115
Performance Tuning
2020-08-19
IBM
Notices
This information was developed for products and services offered in the US. This material might be
available from IBM in other languages. However, you may be required to own a copy of the product or
product version in that language in order to access it.
IBM may not offer the products, services, or features discussed in this document in other countries.
Consult your local IBM representative for information on the products and services currently available in
your area. Any reference to an IBM product, program, or service is not intended to state or imply that only
that IBM product, program, or service may be used. Any functionally equivalent product, program, or
service that does not infringe any IBM intellectual property right may be used instead. However, it is the
user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.
IBM may have patents or pending patent applications covering subject matter described in this
document. The furnishing of this document does not grant you any license to these patents. You can send
license inquiries, in writing, to:
For license inquiries regarding double-byte character set (DBCS) information, contact the IBM Intellectual
Property Department in your country or send inquiries, in writing, to:
Each copy or any portion of these sample programs or any derivative work must include a copyright
notice as follows:
© (your company name) (year).
Portions of this code are derived from IBM Corp. Sample Programs.
© Copyright IBM Corp. _enter the year or years_.
Trademarks
IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business
Machines Corp., registered in many jurisdictions worldwide. Other product and service names might be
trademarks of IBM or other companies. A current list of IBM trademarks is available on the web at
"Copyright and trademark information" at www.ibm.com/legal/copytrade.shtml.
Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both.
Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the
United States, other countries, or both.
UNIX is a registered trademark of The Open Group in the United States and other countries.
Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or
its affiliates.
ii Notices
Applicability
These terms and conditions are in addition to any terms of use for the IBM website.
Personal use
You may reproduce these publications for your personal, noncommercial use provided that all proprietary
notices are preserved. You may not distribute, display or make derivative work of these publications, or
any portion thereof, without the express consent of IBM.
Commercial use
You may reproduce, distribute and display these publications solely within your enterprise provided that
all proprietary notices are preserved. You may not make derivative works of these publications, or
reproduce, distribute or display these publications or any portion thereof outside your enterprise, without
the express consent of IBM.
Rights
Except as expressly granted in this permission, no other permissions, licenses or rights are granted, either
express or implied, to the publications or any information, data, software or other intellectual property
contained therein.
IBM reserves the right to withdraw the permissions granted herein whenever, in its discretion, the use of
the publications is detrimental to its interest or, as determined by IBM, the above instructions are not
being properly followed.
You may not download, export or re-export this information except in full compliance with all applicable
laws and regulations, including all United States export laws and regulations.
IBM MAKES NO GUARANTEE ABOUT THE CONTENT OF THESE PUBLICATIONS. THE PUBLICATIONS ARE
PROVIDED "AS-IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED,
INCLUDING BUT NOT LIMITED TO IMPLIED WARRANTIES OF MERCHANTABILITY, NON-
INFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE.
Notices iii
iv IBM Db2 V11.5: Performance Tuning
Contents
Notices...................................................................................................................i
Trademarks...................................................................................................................................................ii
Terms and conditions for product documentation......................................................................................ii
Figures................................................................................................................ vii
Tables.................................................................................................................. ix
v
Analyzing data.................................................................................................................................... 516
Recovering from sustained traps.......................................................................................................516
Identifying db2diag log entries for a load operation.........................................................................517
Troubleshooting administrative task scheduler................................................................................520
Operation fails because database is currently in use....................................................................... 521
Troubleshooting compression........................................................................................................... 522
Troubleshooting global variable problems........................................................................................525
Troubleshooting inconsistencies.......................................................................................................527
Troubleshooting installation.............................................................................................................. 529
Troubleshooting license issues......................................................................................................... 532
Diagnosing and resolving locking problems......................................................................................534
Troubleshooting SQL performance....................................................................................................545
Troubleshooting optimization............................................................................................................552
Troubleshooting partitioned database environments...................................................................... 554
Troubleshooting scripts..................................................................................................................... 556
Recompile the static section to collect section actuals after applying Fix Pack 1.......................... 556
Troubleshooting storage key support............................................................................................... 557
Troubleshooting the Db2 pureScale Feature..........................................................................................557
How to diagnose a problem............................................................................................................... 557
Understanding the Db2 pureScale Feature resource model............................................................ 559
Understanding how the Db2 pureScale Feature automatically handles failure.............................. 561
uDAPL connectivity in Db2 pureScale environments........................................................................561
Manual trace and log file collection...................................................................................................562
Installation, instance creation and rollback......................................................................................563
Response file installation FAQ........................................................................................................... 579
Post-installation................................................................................................................................. 579
Host or member issues...................................................................................................................... 592
Troubleshooting options for the db2cluster command.................................................................... 613
Uninstallation..................................................................................................................................... 616
Troubleshooting Db2 Text Search...........................................................................................................619
Using the Db2 trace facility for text search operations ....................................................................619
Logging and tracing for the Db2 Text Search server ........................................................................ 619
Monitoring queues for Text Search index updates........................................................................... 620
Troubleshooting hints and tips.......................................................................................................... 622
Learning more about troubleshooting.................................................................................................... 623
Learning more about troubleshooting tools...................................................................................... 623
Searching knowledge bases....................................................................................................................661
Searching troubleshooting resources................................................................................................661
Available troubleshooting resources.................................................................................................662
Getting DB2 product fixes....................................................................................................................... 662
Getting fixes....................................................................................................................................... 662
Support.................................................................................................................................................... 665
Contacting IBM Software Support.....................................................................................................665
Index................................................................................................................ 669
vi
Figures
10. Logical table, record, and index structure for MDC and ITC tables.........................................................62
18. The FCM buffer pool when multiple logical partitions are used.............................................................. 87
vii
24. Steps performed by the SQL and XQuery compiler............................................................................... 210
26. Sharing sets for table and block index scan sharing..............................................................................238
34. Optimizer decision path for both table partitioning and index ANDing.................................................263
viii
Tables
1. Memory sets................................................................................................................................................ 83
3. Parameter combinations.............................................................................................................................95
5. Table types that are supported for online and offline reorganization..................................................... 118
13. Transactions against the ORG table under the CS isolation level......................................................... 156
14. .................................................................................................................................................................181
15. .................................................................................................................................................................182
16. .................................................................................................................................................................182
22. Lock Modes for RID Index Scans with a Single Qualifying Row............................................................ 190
23. Lock Modes for RID Index Scans with Start and Stop Predicates Only................................................ 190
ix
24. Lock Modes for RID Index Scans with Index and Other Predicates (sargs, resids) Only..................... 191
25. Lock Modes for Index Scans Used for Deferred Data Page Access: RID Index Scan with No
Predicates................................................................................................................................................ 191
26. Lock Modes for Index Scans Used for Deferred Data Page Access: After a RID Index Scan with No
Predicates................................................................................................................................................ 191
27. Lock Modes for Index Scans Used for Deferred Data Page Access: RID Index Scan with
Predicates (sargs, resids)........................................................................................................................ 191
28. Lock Modes for Index Scans Used for Deferred Data Page Access: After a RID Index Scan with
Predicates (sargs, resids)........................................................................................................................ 192
29. Lock Modes for Index Scans Used for Deferred Data Page Access: RID Index Scan with Start and
Stop Predicates Only................................................................................................................................192
30. Lock Modes for Index Scans Used for Deferred Data Page Access: After a RID Index Scan with
Start and Stop Predicates Only................................................................................................................192
32. Lock Modes for Table Scans with Predicates on Dimension Columns Only......................................... 193
33. Lock Modes for Table Scans with Other Predicates (sargs, resids)...................................................... 193
35. Lock Modes for RID Index Scans with a Single Qualifying Row............................................................ 194
36. Lock Modes for RID Index Scans with Start and Stop Predicates Only................................................ 194
37. Lock Modes for RID Index Scans with Index Predicates Only.............................................................. 194
38. Lock Modes for RID Index Scans with Other Predicates (sargs, resids)............................................... 195
39. Lock Modes for Index Scans Used for Deferred Data Page Access: RID Index Scan with No
Predicates................................................................................................................................................ 195
40. Lock Modes for Index Scans Used for Deferred Data Page Access: After a RID Index Scan with No
Predicates................................................................................................................................................ 195
41. Lock Modes for Index Scans Used for Deferred Data Page Access: RID Index Scan with
Predicates (sargs, resids)........................................................................................................................ 196
42. Lock Modes for Index Scans Used for Deferred Data Page Access: After a RID Index Scan with
Predicates (sargs, resids)........................................................................................................................ 196
43. Lock Modes for Index Scans Used for Deferred Data Page Access: RID Index Scan with Start and
Stop Predicates Only................................................................................................................................196
x
44. Lock Modes for Index Scans Used for Deferred Data Page Access: After a RID Index Scan with
Start and Stop Predicates Only................................................................................................................196
46. Lock Modes for Index Scans with Predicates on Dimension Columns Only......................................... 197
47. Lock Modes for Index Scans with Start and Stop Predicates Only........................................................198
49. Lock Modes for Index Scans Used for Deferred Data Page Access: Block Index Scan with No
Predicates................................................................................................................................................ 198
50. Lock Modes for Index Scans Used for Deferred Data Page Access: After a Block Index Scan with
No Predicates........................................................................................................................................... 198
51. Lock Modes for Index Scans Used for Deferred Data Page Access: Block Index Scan with
Predicates on Dimension Columns Only................................................................................................. 199
52. Lock Modes for Index Scans Used for Deferred Data Page Access: After a Block Index Scan with
Predicates on Dimension Columns Only................................................................................................. 199
53. Lock Modes for Index Scans Used for Deferred Data Page Access: Block Index Scan with Start
and Stop Predicates Only.........................................................................................................................199
54. Lock Modes for Index Scans Used for Deferred Data Page Access: After a Block Index Scan with
Start and Stop Predicates Only................................................................................................................200
55. Lock Modes for Index Scans Used for Deferred Data Page Access: Block Index Scan with Other
Predicates (sargs, resids)........................................................................................................................ 200
56. Lock Modes for Index Scans Used for Deferred Data Page Access: After a Block Index Scan with
Other Predicates (sargs, resids).............................................................................................................. 200
xi
65. STORE (63 rows)..................................................................................................................................... 395
72. Cardinality estimates before and after joining with DAILY_SALES....................................................... 397
73. Cardinality estimates before and after joining with DAILY_SALES....................................................... 398
81. Real-time statistics collection as a function of the value of the CURRENT EXPLAIN MODE special
register..................................................................................................................................................... 414
87. Feature comparison of db2dart and INSPECT for data objects............................................................ 458
89. Feature comparison of db2dart and INSPECT for block map objects.................................................. 459
xii
90. Feature comparison of db2dart and INSPECT for long field and LOB objects......................................459
xiii
xiv
Chapter 1. Performance overview
Performance refers to the way that a computer system behaves in response to a particular workload.
Performance is measured in terms of system response time, throughput, and resource utilization.
Performance is also affected by:
• The resources that are available on the system
• How well those resources are used and shared
In general, you will want to tune your system to improve its cost-benefit ratio. Specific goals could
include:
• Processing larger, or more demanding, workloads without increasing processing costs
• Obtaining faster system response times, or higher throughput, without increasing processing costs
• Reducing processing costs without degrading service to users
Some benefits of performance tuning, such as a more efficient use of resources and the ability to add
more users to the system, are tangible. Other benefits, such as greater user satisfaction because of
quicker response times, are intangible.
Benchmark testing
Benchmark testing is a normal part of the application development life cycle. It is a team effort that
involves both application developers and database administrators (DBAs).
Benchmark testing is performed against a system to determine current performance and can be used to
improve application performance. If the application code has been written as efficiently as possible,
additional performance gains might be realized by tuning database and database manager configuration
parameters.
Different types of benchmark tests are used to discover specific kinds of information. For example:
• An infrastructure benchmark determines the throughput capabilities of the database manager under
certain limited laboratory conditions.
• An application benchmark determines the throughput capabilities of the database manager under
conditions that more closely reflect a production environment.
Benchmark testing to tune configuration parameters is based upon controlled conditions. Such testing
involves repeatedly running SQL from your application and changing the system configuration (and
perhaps the SQL) until the application runs as efficiently as possible.
The same approach can be used to tune other factors that affect performance, such as indexes, table
space configuration, and hardware configuration, to name a few.
Benchmark testing helps you to understand how the database manager responds to different conditions.
You can create scenarios that test deadlock handling, utility performance, different methods of loading
data, transaction rate characteristics as more users are added, and even the effect on the application of
using a new release of the database product.
Benchmark tests are based on a repeatable environment so that the same test run under the same
conditions will yield results that you can legitimately compare. You might begin by running the test
application in a normal environment. As you narrow down a performance problem, you can develop
specialized test cases that limit the scope of the function that you are testing. The specialized test cases
need not emulate an entire application to obtain valuable information. Start with simple measurements,
and increase the complexity only if necessary.
Characteristics of good benchmarks include:
• The tests are repeatable
• Each iteration of a test starts in the same system state
• No other functions or applications are unintentionally active in the system
• The hardware and software used for benchmark testing match your production environment
Note that started applications use memory even when they are idle. This increases the probability that
paging will skew the results of the benchmark and violates the repeatability criterion.
Benchmark preparation
There are certain prerequisites that must be satisfied before performance benchmark testing can be
initiated.
Before you start performance benchmark testing:
• Complete both the logical and physical design of the database against which your application will run
• Create tables, views, and indexes
• Normalize tables, bind application packages, and populate tables with realistic data; ensure that
appropriate statistics are available
• Plan to run against a production-size database, so that the application can test representative memory
requirements; if this is not possible, try to ensure that the proportions of available system resources to
Analysis shows that the CONNECT (statement 01) took 1.34 seconds to complete, the OPEN CURSOR
(statement 10) took 2 minutes and 8.15 seconds, the FETCH (statement 15) returned seven rows, with
the longest delay being 0.28 seconds, the CLOSE CURSOR (statement 20) took 0.84 seconds, and the
CONNECT RESET (statement 99) took 0.03 seconds to complete.
If your program can output data in a delimited ASCII format, the data could later be imported into a
database table or a spreadsheet for further statistical analysis.
A summary benchmark report might look like the following:
TOTAL_APP_COMMITS
These metrics are closely related to buffer pool hit ratios, but have a slightly different purpose.
Although you can consider target values for hit ratios, there are no possible targets for reads and
ROWS_READ / ROWS_RETURNED
This calculation gives an indication of the average number of rows that are read from database tables
to find the rows that qualify. Low numbers are an indication of efficiency in locating data, and
generally show that indexes are being used effectively. For example, this number can be very high in
the case where the system does many table scans, and millions of rows have to be inspected to
determine if they qualify for the result set. Alternatively, this statistic can be very low in the case of
access to a table through a fully-qualified unique index. Index-only access plans (where no rows
need to be read from the table) do not cause ROWS_READ to increase.
In an OLTP environment, this metric is generally no higher than 2 or 3, indicating that most access is
through indexes instead of table scans. This metric is a simple way to monitor plan stability over time
- an unexpected increase is often an indication that an index is no longer being used and should be
investigated.
5. The amount of time spent sorting per transaction:
TOTAL_SORT_TIME / TOTAL_APP_COMMITS
This is an efficient way to handle sort statistics, because any extra time due to spilled sorts
automatically gets included here. That said, you might also want to collect TOTAL_SORTS and
SORT_OVERFLOWS for ease of analysis, especially if your system has a history of sorting issues.
6. The amount of lock wait time accumulated per thousand transactions:
Excessive lock wait time often translates into poor response time, so it is important to monitor. The
value is normalized to one thousand transactions because lock wait time on a single transaction is
typically quite low. Scaling up to one thousand transactions provides measurements that are easier
to handle.
7. The number of deadlocks and lock timeouts per thousand transactions:
Although deadlocks are comparatively rare in most production systems, lock timeouts can be more
common. The application usually has to handle them in a similar way: re-executing the transaction
from the beginning. Monitoring the rate at which this happens helps avoid the case where many
deadlocks or lock timeouts drive significant extra load on the system without the DBA being aware.
8. The number of dirty steal triggers per thousand transactions:
Package cache insertions are part of normal execution of the system; however, in large numbers,
they can represent a significant consumer of CPU time. In many well-designed systems, after the
system is running at steady-state, very few package cache inserts occur, because the system is using
or reusing static SQL or previously prepared dynamic SQL statements. In systems with a high traffic
of ad hoc dynamic SQL statements, SQL compilation and package cache inserts are unavoidable.
However, this metric is intended to watch for a third type of situation, one in which applications
unintentionally cause package cache churn by not reusing prepared statements, or by not using
parameter markers in their frequently executed SQL.
10. The time an agent waits for log records to be flushed to disk:
LOG_WRITE_TIME
/ TOTAL_APP_COMMITS
The transaction log has significant potential to be a system bottleneck, whether due to high levels of
activity, or to improper configuration, or other causes. By monitoring log activity, you can detect
problems both from the Db2 side (meaning an increase in number of log requests driven by the
application) and from the system side (often due to a decrease in log subsystem performance caused
by hardware or configuration problems).
11. In partitioned database environments, the number of fast communication manager (FCM) buffers
sent and received between partitions:
FCM_SENDS_TOTAL, FCM_RECVS_TOTAL
These give the rate of flow of data between different partitions in the cluster, and in particular,
whether the flow is balanced. Significant differences in the numbers of buffers received from
different partitions might indicate a skew in the amount of data that has been hashed to each
partition.
Procedure
• To start the governor, use the db2gov command, specifying the following required parameters:
START database-name
The database name that you specify must match the name of the database in the governor
configuration file.
config-file
The name of the governor configuration file for this database. If the file is not in the default
location, which is the sqllib directory, you must include the file path as well as the file name.
log-file
The base name of the log file for this governor. For a partitioned database, the database partition
number is added for each database partition on which a daemon is running for this instance of the
governor.
• To start the governor on a single database partition of a partitioned database, specify the
dbpartitionnum option.
For example, to start the governor on database partition 3 of a database named SALES, using a
configuration file named salescfg and a log file called saleslog, enter the following command:
• To start the governor on all database partitions, enter the following command:
General clauses
The following clauses cannot be specified more than once in a governor configuration file.
dbname
The name or alias of the database to be monitored. This clause is required.
account n
The interval, in minutes, after which account records containing CPU usage statistics for each
connection are written. This option is not available on Windows operating systems. On some
platforms, CPU statistics are not available from the snapshot monitor. If this is the case, the account
clause is ignored.
If a short session occurs entirely within the account interval, no log record is written. When log
records are written, they contain CPU statistics that reflect CPU usage since the previous log record
for the connection. If the governor is stopped and then restarted, CPU usage might be reflected in two
log records; these can be identified through the application IDs in the log records.
interval n
The interval, in seconds, after which the daemon wakes up. If you do not specify this clause, the
default value of 120 seconds is used.
Rule clauses
Rule statements specify how applications are to be governed, and are assembled from smaller
components called rule clauses. If used, rule clauses must appear in a specific order in the rule
statement, as follows:
1. desc: A comment about the rule, enclosed by single or double quotation marks
2. time: The time at which the rule is evaluated
3. authid: One or more authorization IDs under which the application executes statements
4. applname: The name of the executable or object file that connects to the database. This name is case
sensitive. If the application name contains spaces, the name must be enclosed by double quotation
marks.
If more than one rule applies to an application, all are applied. Usually, the action that is associated with
the first limit encountered is the action that is applied first. An exception occurs if you specify a value of -1
for a rule clause: A subsequently specified value for the same clause can only override the previously
specified value; other clauses in the previous rule statement are still operative.
For example, one rule statement uses the rowssel 100000 and uowtime 3600 clauses to specify that
the priority of an application is decreased if its elapsed time is greater than 1 hour or if it selects more
than 100 000 rows. A subsequent rule uses the uowtime -1 clause to specify that the same application
can have unlimited elapsed time. In this case, if the application runs for more than 1 hour, its priority is
not changed. That is, uowtime -1 overrides uowtime 3600. However, if it selects more than 100 000
rows, its priority is lowered because rowssel 100000 still applies.
To ensure that a less restrictive rule overrides a more restrictive previous rule, specify -1 to clear the
previous rule before applying the new one. For example, in the following configuration file, the initial rule
limits all users to 5000 rows. The second rule clears this limit for ADMIN, and the third rule resets the
limit for ADMIN to 10000 rows.
desc "Schedule all CPU hogs in one class, which will control consumption."
setlimit cpu 3600
action schedule class;
desc "Slow down the use of the Db2 CLP by the novice user."
authid novice
applname db2bp.exe
setlimit cpu 5 locks 100 rowssel 250;
desc "During the day, do not let anyone run for more than 10 seconds."
time 8:30 17:00 setlimit cpu 10 action force;
Limit clauses
setlimit
Specifies one or more limits for the governor to check. The limits must be -1 or greater than 0 (for
example, cpu -1 locks 1000 rowssel 10000). At least one limit must be specified, and any
limit that is not specified in a rule statement is not limited by that rule. The governor can check the
following limits:
cpu n
Specifies the number of CPU seconds that can be consumed by an application. If you specify -1,
the application's CPU usage is not limited.
idle n
Specifies the number of idle seconds that are allowed for a connection. If you specify -1, the
connection's idle time is not limited.
Note: Some database utilities, such as backup and restore, establish a connection to the database
and then perform work through engine dispatchable units (EDUs) that are not visible to the
governor. These database connections appear to be idle and might exceed the idle time limit. To
prevent the governor from taking action against these utilities, specify -1 for them through the
authorization ID that invoked them. For example, to prevent the governor from taking action
against utilities that are running under authorization ID DB2SYS, specify authid DB2SYS
setlimit idle -1.
locks n
Specifies the number of locks that an application can hold. If you specify -1, the number of locks
held by the application is not limited.
rowsread n
Specifies the number of rows that an application can select. If you specify -1, the number of rows
the application can select is not limited. The maximum value that can be specified is
4 294 967 298.
Note: This limit is not the same as rowssel. The difference is that rowsread is the number of
rows that must be read to return the result set. This number includes engine reads of the catalog
tables and can be reduced when indexes are used.
rowssel n
Specifies the number of rows that can be returned to an application. This value is non-zero only at
the coordinator database partition. If you specify -1, the number of rows that can be returned is
not limited. The maximum value that can be specified is 4 294 967 298.
uowtime n
Specifies the number of seconds that can elapse from the time that a unit of work (UOW) first
becomes active. If you specify -1, the elapsed time is not limited.
Note: If you used the sqlmon API to deactivate the unit of work monitor switch or the timestamp
monitor switch, this will affect the ability of the governor to govern applications based on the unit
of work elapsed time. The governor uses the monitor to collect information about the system. If a
unit of work (UOW) of the application has been started before the Governor starts, then the
Governor will not govern that UOW.
Action clauses
action
Specifies the action that is to be taken if one or more specified limits is exceeded. If a limit is
exceeded and the action clause is not specified, the governor reduces the priority of agents working
for the application by 10.
schedule [class]
Scheduling improves the priorities of agents working on applications. The goal is to minimize the
average response time while maintaining fairness across all applications.
The governor chooses the top applications for scheduling on the basis of the following criteria:
• The application holding the greatest number of locks (an attempt to reduce the number of lock
waits)
• The oldest application
• The application with the shortest estimated remaining run time (an attempt to allow as many
short-lived statements as possible to complete during the interval)
The top three applications in each criterion are given higher priorities than all other applications.
That is, the top application in each criterion group is given the highest priority, the next highest
application is given the second highest priority, and the third-highest application is given the third
highest priority. If a single application is ranked in the top three for more than one criterion, it is
• Ensure that each of several user groups (for example, organizational departments) gets equal
prioritization. If one group is running a large number of applications, the administrator can
ensure that other groups are still able to obtain reasonable response times for their
applications. For example, in a case involving three departments (Finance, Inventory, and
Planning), all the Finance users could be put into one group, all the Inventory users could be put
into a second group, and all the Planning users could be put into a third group. The processing
power would be split more or less evenly among the three departments.
The following example shows a portion of a governor configuration file that illustrates this point:
The format of the Date and Time fields is yyyy-mm-dd-hh.mm.ss. You can merge the log files for each
database partition by sorting on this field. The DBPartitionNum field contains the number of the database
partition on which the governor is running.
The RecType field contains different values, depending on the type of record being written to the log. The
values that can be recorded are:
• ACCOUNT: the application accounting statistics
• ERROR: an error occurred
• FORCE: an application was forced
• NICE: the priority of an application was changed
• READCFG: the governor read the configuration file
• SCHEDGRP: a change in agent priorities occurred
• START: the governor was started
• STOP: the governor was stopped
• WARNING: a warning occurred
Some of these values are described in more detail in the following list.
ACCOUNT
An ACCOUNT record is written in the following situations:
ERROR
An ERROR record is written when the governor daemon needs to shut down.
FORCE
A FORCE record is written when the governor forces an application, based on rules in the governor
configuration file. The FORCE record has the following format:
where:
coord_partition
Specifies the number of the application's coordinator database partition.
cfg_line
Specifies the line number in the governor configuration file where the rule causing the application
to be forced is located.
restriction_exceeded
Provides details about how the rule was violated. Valid values are:
• CPU: the total application USR CPU plus SYS CPU time, in seconds
• Locks: the total number of locks held by the application
• Rowssel: the total number of rows selected by the application
• Rowsread: the total number of rows read by the application
• Idle: the amount of time during which the application was idle
• ET: the elapsed time since the application's current unit of work started (the uowtime
setlimit was exceeded)
NICE
A NICE record is written when the governor changes the priority of an application, based on rules in
the governor configuration file. The NICE record has the following format:
where:
nice_value
Specifies the increment or decrement that will be made to the priority value for the application's
agent process.
cfg_line
Specifies the line number in the governor configuration file where the rule causing the
application's priority to be changed is located.
restriction_exceeded
Provides details about how the rule was violated. Valid values are:
• CPU: the total application USR CPU plus SYS CPU time, in seconds
• Locks: the total number of locks held by the application
• Rowssel: the total number of rows selected by the application
• Rowsread: the total number of rows read by the application
where:
cfg_line
Specifies the line number in the governor configuration file where the rule causing the application
to be scheduled is located.
restriction_exceeded
Provides details about how the rule was violated. Valid values are:
• CPU: the total application USR CPU plus SYS CPU time, in seconds
• Locks: the total number of locks held by the application
• Rowssel: the total number of rows selected by the application
• Rowsread: the total number of rows read by the application
• Idle: the amount of time during which the application was idle
• ET: the elapsed time since the application's current unit of work started (the uowtime
setlimit was exceeded)
START
A START record is written when the governor starts. The START record has the following format:
Database = <database_name>
STOP
A STOP record is written when the governor stops. It has the following format:
Database = <database_name>
WARNING
A WARNING record is written in the following situations:
• The sqlefrce API was called to force an application, but it returned a positive SQLCODE.
• A snapshot call returned a positive SQLCODE that was not 1611 (SQL1611W).
• A snapshot call returned a negative SQLCODE that was not -1224 (SQL1224N) or -1032
(SQL1032N). These return codes occur when a previously active instance has been stopped.
• On Linux and UNIX, an attempt to install a signal handler has failed.
Because standard values are written, you can query the log files for different types of actions. The
Message field provides other nonstandard information that depends on the type of record. For example, a
FORCE or NICE record includes application information in the Message field, whereas an ERROR record
includes an error message.
A governor log file might look like the following example:
Procedure
• To stop the governor, use the db2gov command, specifying the STOP parameter.
Example
For example, to stop the governor on all database partitions of the SALES database, enter the following
command:
To stop the governor on only database partition 3, enter the following command:
For db2mon.sql and db2mon_export.sql to properly collect monitoring data, monitoring must be
enabled at the database level with the following database configuration parameters:
• MON_ACT_METRICS must be set at least to BASE, which is the default value.
Procedure
Follow these steps to collect and report performance monitor data with db2mon:
1. Run db2mon by using one of the three following approaches:
• Online mode by using db2mon.sh:
a. Ensure that performance information is collected during normal database activity or in parallel
with a test workload that is running on the database.
b. From the command line, type:
where MyDatabaseName is the name of the database that you are monitoring.
Note: It is advised that you collect data for a maximum of 300 seconds (5 minutes) to avoid
wrapping some of the counters. For monitoring longer periods, collecting successive reports is
preferred.
• Online mode by using an existing database connection:
a. Run db2mon on your current database connection, as follows:
Important: If you run db2mon on your current connection and interrupt db2mon.sql, such as
pressing Ctrl-C while it is running, the CURRENT SCHEMA special register might still be set to
SESSION_USER. As a result, any SQL statements that are run after the interruption might be
affected. If the connection is interrupted, you might need to manually change CURRENT
SCHEMA to the original value. db2mon uses CURRENT SCHEMA to resolve its table references to
the DGTTs that are used during online collection in db2mon.sql.
• Offline mode:
a. Run the following command to generate an output report:
db2mon_export.sql exports the contents of all MON_GET functions that db2mon uses to IXF
files that are created in the current working directory. The content is exported twice; when the
script starts, and when the script ends. The I/O processing of the export operation is low.
Note: Exporting all columns can be helpful when reference to the original data is needed for
analysis. Ad hoc queries from the command line or altered versions of db2mon_report.sql
can take advantage of the extra metrics when used with the exported data set.
b. The IXF files can be transferred to another system for import into another Db2 database to
create the report. The report can be created on any operating system or version of Db2. For
versions of Db2 before Version 11.1.3.3, you also need to copy the db2mon SQL scripts. All of
the IXF files must be transferred (by using the scp command, for example) to the system where
the report will be created. For example, to transfer all of the IXF files to the directory reports/
2018-02-16 by using the dbreports account on the analysis1 system:
c. On the system where the report will be created, go to the directory where the IXF files are
located.
d. Run the following command to import the data from the IXF files:
Important: If you run db2mon on your current connection and interrupt db2mon.sql, such as
pressing Ctrl-C while it is running, the CURRENT SCHEMA special register is likely to be set to
SESSION_USER. As a result, the subsequent SQL that is run on that connection is affected.
db2mon uses the proper schema for permanent tables that are produced by
db2mon_import.sql and db2mon_report.sql.
db2mon_import.sql uses the Db2 IMPORT utility to reconstitute the MON_GET data back into
Db2 tables for analysis. IMPORT is used because it creates tables automatically. You do not
need to reproduce CREATE TABLE statements to match the source system's tables.
2. If you ran the report in online mode, check the report for errors.
Many errors can be reported in db2mon.out when the script attempts to drop tables that do not exist.
These errors can be ignored. Other types of errors might indicate that a user temporary table space
doesn't exist, or that the script was generated for the wrong version of Db2.
3. View the report and analyze its contents. Details are provided in the Results section.
Note: The report section begins after the text "STARTS". The report can be wide, so it is best to use a
text editor or viewer that is capable of viewing long lines, such as less or vim.
Results
The report includes many useful queries, roughly grouped into the following sections:
• Point-in-time data (such as currently running SQL, utilities, and lock waits), which is collected at the
beginning and end of the interval.
• Cumulative data, which is measured over the whole interval:
– Data that is collected at various hierarchical levels (such as database level, table space level, buffer
pool level, table level, query level, and connection-level).
– Data that is collected for different deployment types (examples include standard Db2 ESE, Db2
pureScale, and Db2 with BLU).
– Environment information (such as database and instance configuration, registry variables, CPU
counts and usage, and memory usage).
Example
Sample output is provided to illustrate different ways the output report can be analyzed.
The sample output is provided in an increasingly granular order, with high-level sections presented first,
and sections with finer details provided later. Get a general sense of the performance of your database
from the high-level sections, then use that information to determine which detailed sections to inspect to
gain deeper insight.
Most of the tables in your report are wider than the following output. Some of the tables here are trimmed
for readability. In some cases, the output is split and wrapped.
1. Use the "Throughput metrics at database level" section to identify work that is done by the system.
======================================
Throughput metrics at database level
======================================
The previous output shows four members, each completing approximately 2400 transactions per
second (CMT_PER_S) or 25,000 SQL statements per second (ACT_PER_S), and additional metrics
shown in the table. These transactions and statements are measured over 35 seconds (TS_DELTA).
The process of collecting data can sometimes increase the planned interval slightly. In this case, the
planned interval was 30 seconds.
Abbreviated names are common in the report:
• ACT_PER_S: Activities per second
• CMT_PER_S: Commits per second
• RB_PER_S: Rollbacks per second
• DDLCK_PER_S: Deadlocks per second
• SEL_P_S: Select statements per second
• UID_P_S: Update/Insert/Delete statements per second
• ROWS_INS_P_S: Rows that are inserted per second
2. Use the "Time breakdown at database level" section to identify processing time for the database.
=====================================================
Time breakdown at database level (wait + processing)
=====================================================
4 record(s) selected.
==============================
Wait times at database level
==============================
4 record(s) se1ected.
In this example, each member spends approximately 67% of each request waiting
(PCT_RQST_WAIT). Most of this wait time is spent on cluster caching facility (CF) communications
(PCT_CF), followed by log writes (PCT_LG_DST) and lock wait (PCT_LOCK).
4. You can use various sections from the report to identify statements that use the most database
processing time.
Various sections of the report show statement data from the package cache, which means that
statements finished running during the monitoring interval. Statements still running when monitoring
finished are shown in the point-in-time section at the beginning of the report.
The top 100 statements by total activity time are listed, and different views are shown in tables, with
a focus on views such as basic metrics, wait time, sorting, and I/O.
Compare the "Wait times at database level" section (from Example “3” on page 27) with the "Top
SQL statements by execution time" section in this example to determine whether high wait times are
caused by either:
• A few statements
• A system-wide issue that affects all statements
System-wide issues might be improved by configuration changes or changes in hardware. Statement
performance can be improved by using techniques such as altering SQL, adding or removing indexes,
and refreshing statistics with the RUNSTATS command.
The "Top SQL statements by execution time" section shows raw CPU usage.
======================================
Top SQL statements by execution time
======================================
The "Wait time breakdown" section shows statements that are not running because they are waiting
for resources, such as (but not limited to) disk, latches, and locks.
==============================================================
Wait time breakdown for top SQL statements by execution time
==============================================================
14 record(s) selected.
Tip: The "Statement and Plan Identifiers" section shows the EXECUTABLE_ID for each of the top 100
statements. You can use the EXECUTABLE_ID with the MON_GET_PKG_CACHE_STMT function to get
the access plan for any of these statements. You can also use EXECUTABLE_ID to determine the
explain plan, by using the following command:
The "Top SQL statements by execution time, aggregated by PLANID" section is similar to the "Top
SQL statement by execution time" section. All statements that differ by literal values (and have the
same PLANID hash value) are aggregated to show the total costs. This information is useful when a
transactional application is driving lots of SQL statements with literal values.
The "IO statistics per statement" section shows the buffer pool activity for the top 100 SQL
statements.
==========================================================
IO statistics per stmt - top statements by execution time
==========================================================
21 record(s) selected.
I/O information can indicate whether a statement is using an index. AVG_I_LRD stands for average
index logical reads per execution. AVG_D_LRD stands for average data logical reads. In the previous
output, the first four entries have high AVG_I_LRD values and AVG_D_LRD values of zero. This
combination of values is indicative of an index-based plan. If a statement has high data logical reads
and low index logical reads, changes to the indexes might improve performance.
5. Use the "Database log write times" section to examine log disk performance.
Statements whose execution times are relatively higher than the other statements might suggest a
poorly optimized access plan. If you add an index and it does not provide a solution, compare the
highest wait times from the database and statement levels. This information identifies the largest
contributors to wait time, then consults the appropriate report sections to provide more detailed
information.
In db2mon, the individual wait time percentages (such as log disk wait time) are calculated as a
percent of total request time, not as a percent of total wait time. Total wait time (where everything
adds up to 100% wait time) can be calculated by using monreport.dbsummary procedure.
==========================
Database log write times
==========================
4 record(s) selected.
In the previous output, the log write time per I/O (LOG_WRITE_TIME_PER_IO_MS) is approximately
0.4 milliseconds, which is optimal. Values vary between systems, but log write percentage times
above 4.0 milliseconds indicate that changes to storage configuration might be required.
Reconfiguration includes changes to cache, RAID, and number of LUNs per file system.
6. Use the "Bufferpool read statistics" section and the "Disk read and write I/O times" section to
examine read times.
Note: The output in the "Bufferpool read statistics" section includes the following abbreviations:
• POOL_DATA_L_READS: Buffer pool data page logical reads (base table ORGANIZE BY ROW access)
• POOL_DATA_P_READS: Buffer pool data page physical reads
• POOL_INDEX_L_READS: Buffer pool index page logical reads (index access)
• POOL_INDEX_P_READS: Buffer pool index page physical reads
• POOL_COL_L_READS: Buffer pool column-organized data page logical reads (base table ORGANIZE
BY COLUMN access)
• POOL_COL_P_READS: Buffer pool column-organized data page physical reads
============================
Bufferpool read statistics
============================
High pool read times is a common issue. High pool read times are reported in the PCT_POOL_RD
column in the "Wait time breakdown for top SQL statements by execution time" section (from
Example “4” on page 28). This issue is especially common when the system is reading many pages
from disk into buffer pool. This type of heavy page read activity typically occurs shortly after database
activation, or when a new application starts to make connections. Average pool read times
(AVG_READ_TIME) are reported at the buffer pool level and the table space level. Similarly, a high
percent direct I/O time for large objects can be tracked in AVG_DRCT_READ_TIME and
AVG_DRCT_WRITE_TIME.
===============================
Disk read and write I/O times
===============================
10 record(s) selected.
7. Use the "Round-trip CF" section to determine round-trip time for messages between members and
CFs.
Percentage CF wait time is reported in PCT_CF in "Wait times at database level" (Example “3” on
page 27) and "Wait time breakdown" (Example “4” on page 28). If the PCT_CF values in your report
are high, the "Round-trip CF" section shows performance data on messages that are exchanged
between the Db2 member and the CFs.
===================================================================
Round-trip CF command execution counts and average response times
===================================================================
12 record(s) selected.
8. Use the "Page reclaim metrics" section to examine page reclaim metrics for index and data pages.
Reclaim wait time is reported in PCT_RCLM in "Wait times at database level" (Example “3” on page
27) and "Wait time breakdown" (Example “4” on page 28). If PCT_RCLM values in your report are
above zero, the following section shows which table has the most reclaim activity, and whether the
table or an index on the table is reclaimed:
===============================================
Page reclaim metrics for index and data pages
===============================================
select member, substr(tabschema,1,20) as tabschema, substr(tabname,1,40) as tabname,
substr(objtype,1,10) as objtype, data_partition_id, iid, (page ...
11 record(s) selected.
The sample output shows highest reclaim activity on indexes on ORDERS and ORDER_LINE across all
the members. However, these RECLAIM_WAIT_TIME values are not high and this can be confirmed
by examining the PCT_RCLM (percent reclaim wait time) in the "Wait times at database level"
section. High reclaim activity of index pages is most common, and can be improved by using
RANDOM indexes, CURRENT MEMBER partitioning, and range partitioning.
9. Use the "Page reclaim metrics for SMP pages" section to identify reclaim activity of space map pages
(SMPs). For details about space map page reclaim metrics, see "spacemappage_reclaim_wait_time -
Space map page reclaim wait time monitor element " in Database Monitoring Guide and Reference.
The following output shows reclaimed SMP pages, which are often due to heavy inserts into a table
whose table space extent size is too small.
====================================
Page reclaim metrics for SMP pages
====================================
10 record(s) se1ected.
10. Use the "Latch wait metrics" section to identify latch waits.
Latch wait time percentages is reported in PCT_LTCH in "Wait times at database level" (Example “3”
on page 27) and "Wait time breakdown" (Example “4” on page 28). Latch wait time percentages
(PCT_LTCH) higher than 15% to 20% is considered high. The following section shows details for latch
waits by type:
====================
Latch wait metrics
====================
13 record(s) selected.
Most values for TIME_PER_LATCH_WAIT_MS are well below a few seconds, measured in a 30-
second period, across all agents that operate on the system. Therefore, this system shows no
significant latching issue.
System architecture
Db2 architecture and process overview
On the client side, local or remote applications are linked with the Db2 client library. Local clients
communicate using shared memory and semaphores; remote clients use a protocol, such as named pipes
(NPIPE) or TCP/IP. On the server side, activity is controlled by engine dispatchable units (EDUs).
Figure 3 on page 37 shows a general overview of the Db2 architecture and processes.
Client programs
Client programs can be remote or local, running on the same machine as the database server. Client
programs make first contact with a database through a communication listener.
Listeners
Communication listeners start when the Db2 database server starts. There is a listener for each
configured communications protocol, and an interprocess communications (IPC) listener (db2ipccm) for
local client programs. Listeners include:
• db2ipccm, for local client connections
• db2tcpcm, for TCP/IP connections
• db2tcpdm, for TCP/IP discovery tool requests
Agents
All connection requests from local or remote client programs (applications) are allocated a corresponding
coordinator agent (db2agent). When the coordinator agent is created, it performs all database requests
on behalf of the application.
db2fmp
The fenced mode process is responsible for executing fenced stored procedures and user-defined
functions outside of the firewall. The db2fmp process is always a separate process, but might be
multithreaded, depending on the types of routines that it executes.
db2vend
The db2vend process is a process to execute vendor code on behalf of an EDU; for example, to execute a
user exit program for log archiving (UNIX only).
Database EDUs
The following list includes some of the important EDUs that are used by each database:
• db2cmpd, for compression daemon to execute tasks related to compression. In a partitioned database
environment, a db2cmpd EDU runs on each partition independently.
• db2dlock, for deadlock detection. In a partitioned database environment, an additional thread
(db2glock) is used to coordinate the information that is collected by the db2dlock EDU on each
partition; db2glock runs only on the catalog partition. In a Db2 pureScale environment, a db2glock
EDU is used to coordinate the information that is collected by the db2dlock EDU on each member. A
db2glock EDU is started on each member, but only one is active.
• db2fw, the event monitor fast writer; which is used for high volume, parallel writing of event monitor
data to tables, files, or pipes
– db2fwx, an event monitor fast writer thread where "x" identifies the thread number. During database
activation the Db2engine sets the number of db2fwx threads to a value that is optimal for the
performance of event monitors and avoids potential performance problems when different types of
workloads are run. The number of db2fwx threads equals the number of logical CPUs on the system
(for multi-core CPUs, each core counts as one logical CPU). For DPF instances, the number of db2fwx
Database agents
When an application accesses a database, several processes or threads begin to perform the various
application tasks. These tasks include logging, communication, and prefetching. Database agents are
threads within the database manager that are used to service application requests. In Version 9.5, agents
are run as threads on all platforms.
The maximum number of application connections is controlled by the max_connections database
manager configuration parameter. The work of each application connection is coordinated by a single
worker agent. A worker agent carries out application requests but has no permanent attachment to any
particular application. Coordinator agents exhibit the longest association with an application, because
they remain attached to it until the application disconnects. The only exception to this rule occurs when
the engine concentrator is enabled, in which case a coordinator agent can terminate that association at
transaction boundaries (COMMIT or ROLLBACK).
There are three types of worker agents:
• Idle agents
This is the simplest form of worker agent. It does not have an outbound connection, and it does not
have a local database connection or an instance attachment.
• Active coordinator agents
Each database connection from a client application has a single active agent that coordinates its work
on the database. After the coordinator agent is created, it performs all database requests on behalf of
its application, and communicates to other agents using interprocess communication (IPC) or remote
communication protocols. Each agent operates with its own private memory and shares database
manager and database global resources, such as the buffer pool, with other agents. When a transaction
completes, the active coordinator agent might become an inactive agent. When a client disconnects
from a database or detaches from an instance, its coordinator agent will be:
– An active coordinator agent if other connections are waiting
– Freed and marked as idle if no connections are waiting, and the maximum number of pool agents is
being automatically managed or has not been reached
– Terminated and its storage freed if no connections are waiting, and the maximum number of pool
agents has been reached
• Subagents
The coordinator agent distributes database requests to subagents, and these subagents perform the
requests for the application. After the coordinator agent is created, it handles all database requests on
behalf of its application by coordinating the subagents that perform requests against the database. In
Db2 Version 9.5, subagents can also exist in nonpartitioned environments and in environments where
intraquery parallelism is not enabled.
Agents that are not performing work for any application and that are waiting to be assigned are
considered to be idle agents and reside in an agent pool. These agents are available for requests from
coordinator agents operating on behalf of client programs, or for subagents operating on behalf of existing
coordinator agents. The number of available agents depends on the value of the num_poolagents
database manager configuration parameter.
If no idle agents exist when an agent is required, a new agent is created dynamically. Because creating a
new agent requires a certain amount of overhead, CONNECT and ATTACH performance is better if an idle
agent can be activated for a client.
When a subagent is performing work for an application, it is associated with that application. After it
completes the assigned work, it can be placed in the agent pool, but it remains associated with the
original application. When the application requests additional work, the database manager first checks
the idle pool for associated agents before it creates a new agent.
Example
Consider the following scenario:
• The max_connections parameter is set to AUTOMATIC and has a current value of 300
• The max_coordagents parameter is set to AUTOMATIC and has a current value of 100
The ratio of max_connections to max_coordagents is 300:100. The database manager creates new
coordinating agents as connections come in, and connection concentration is applied only when needed.
These settings result in the following actions:
• Connections 1 to 100 create new coordinating agents
• Connections 101 to 300 do not create new coordinating agents; they share the 100 agents that have
been created already
• Connections 301 to 400 create new coordinating agents
• Connections 401 to 600 do not create new coordinating agents; they share the 200 agents that have
been created already
• and so on...
In this example, it is assumed that the connected applications are driving enough work to warrant
creation of new coordinating agents at each step. After some period of time, if the connected applications
are no longer driving sufficient amounts of work, coordinating agents will become inactive and might be
terminated.
If the number of connections is reduced, but the amount of work being driven by the remaining
connections is high, the number of coordinating agents might not be reduced right away. The
Fenced user-defined functions (UDFs) and stored procedures, which are not shown in the figure, are
managed to minimize costs that are associated with their creation and destruction. The default value of
the keepfenced database manager configuration parameter is YES, which keeps the stored procedure
process available for reuse at the next procedure call.
Note: Unfenced UDFs and stored procedures run directly in an agent's address space for better
performance. However, because they have unrestricted access to the agent's address space, they must
be rigorously tested before being used.
Figure 7 on page 48 shows the similarities and differences between the single database partition
processing model and the multiple database partition processing model.
In a multiple database partition environment, the database partition on which the CREATE DATABASE
command was issued is called the catalog database partition. It is on this database partition that the
system catalog tables are stored. The system catalog is a repository of all of the information about objects
in the database.
As shown in Figure 7 on page 48, because Application A creates the PROD database on Node0000, the
catalog for the PROD database is also created on this database partition. Similarly, because Application B
creates the TEST database on Node0001, the catalog for the TEST database is created on this database
partition. It is a good idea to create your databases on different database partitions to balance the extra
activity that is associated with the catalog for each database across the database partitions in your
environment.
Examples
• Consider a single-partition database to which, on average, 1000 users are connected simultaneously. At
times, the number of connected users might be higher. The number of concurrent transactions can be
as high as 200, but it is never higher than 250. Transactions are short.
For this workload, you could set the following database manager configuration parameters:
– Set max_coordagents to 250 to support the maximum number of concurrent transactions.
– Set max_connections to AUTOMATIC with a value of 1000 to ensure support for any number of
connections; in this example, any value greater than 250 will ensure that the connection
concentrator is enabled.
– Leave num_poolagents at the default value, which should ensure that database agents are
available to service incoming client requests, and that little overhead will result from the creation of
new agents.
• Consider a single-partition database to which, on average, 1000 users are connected simultaneously. At
times, the number of connected users might reach 2000. An average of 500 users are expected to be
executing work at any given time. The number of concurrent transactions is approximately 250. Five
This means that as the number of connections beyond 1000 increases, additional coordinating agents
will be created as needed, with a maximum to be determined by the total number of connections. As
the workload increases, the database manager attempts to maintain a relatively stable ratio of
connections to coordinating agents.
• Suppose that you do not want to enable the connection concentrator, but you do want to limit the
number of connected users. To limit the number of simultaneously connected users to 250, for
example, you could set the following database manager configuration parameters:
– Set max_coordagents to 250.
– Set max_connections to 250.
• Suppose that you do not want to enable the connection concentrator, and you do not want to limit the
number of connected users. You could update the database manager configuration as follows:
Hardware configuration
CPU capacity is one of the main independent variables in configuring a system for performance. Because
all other hardware configuration typically flows from it, it is not easy to predict how much CPU capacity is
required for a given workload. In business intelligence (BI) environments, 200-300 GB of active raw data
per processor core is a reasonable estimate. For other environments, a sound approach is to gauge the
amount of CPU required, based on one or more existing Db2 systems. For example, if the new system
needs to handle 50% more users, each running SQL that is at least as complex as that on an existing
system, it would be reasonable to assume that 50% more CPU capacity is required. Likewise, other
factors that predict a change in CPU usage, such as different throughput requirements or changes in the
use of triggers or referential integrity, should be taken into account as well.
After you have the best idea of CPU requirements (derived from available information), other aspects of
hardware configuration start to fall into place. Although you must consider the required system disk
capacity in gigabytes or terabytes, the most important factors regarding performance are the capacity in
I/Os per second (IOPS), or in megabytes per second of data transfer. In practical terms, this is
determined by the number of individual disks involved.
Why is that the case? The evolution of CPUs over the past decade has seen incredible increases in speed,
whereas the evolution of disks has been more in terms of their capacity and cost. There have been
improvements in disk seek time and transfer rate, but they haven't kept pace with CPU speeds. So to
achieve the aggregate performance needed with modern systems, using multiple disks is more important
than ever, especially for systems that will drive a significant amount of random disk I/O. Often, the
temptation is to use close to the minimum number of disks that can contain the total amount of data in
the system, but this generally leads to very poor performance.
In the case of RAID storage, or for individually addressable drives, a rule-of-thumb is to configure at least
ten to twenty disks per processor core. For storage servers, a similar number is required. However, in this
case, a bit of extra caution is warranted. Allocation of space on storage servers is often done more with an
eye to capacity rather than throughput. It is a very good idea to understand the physical layout of
database storage, to ensure that the inadvertent overlap of logically separate storage does not occur. For
example, a reasonable allocation for a 4-way system might be eight arrays of eight drives each. However,
if all eight arrays share the same eight underlying physical drives, the throughput of the configuration
would be drastically reduced, compared to eight arrays spread over 64 physical drives.
It is good practice to set aside some dedicated (unshared) disk for the Db2 transaction logs. This is
because the I/O characteristics of the logs are very different from Db2 containers, for example, and the
competition between log I/O and other types of I/O can result in a logging bottleneck, especially in
systems with a high degree of write activity.
AIX configuration
There are relatively few AIX parameters that need to be changed to achieve good performance. Again, if
there are specific settings already in place for your system (for example, a BW or SAP configuration),
those should take precedence over the following general guidelines.
• The VMO parameter minperm% should be set to 3. This is the default value in AIX 7.1.
• The AIO parameter maxservers can be initially left at the default value of ten per CPU. After the
system is active, maxservers is tuned as follows:
1. Collect the output of the ps -elfk | grep aio command and determine if all asynchronous I/O
(AIO) kernel processes (aioservers) are consuming the same amount of CPU time.
2. If they are, maxservers might be set too low. Increase maxservers by 10%, and repeat step 1.
3. If some aioservers are using less CPU time than others, the system has at least as many of them as it
needs. If more than 10% of aioservers are using less CPU, reduce maxservers by 10% and repeat
step 1.
• The AIO parameter maxreqs should be set to MAX( NUM_IOCLEANERS x 256, 4096 ). This
parameter controls the maximum number of outstanding AIO requests.
• The hdisk parameter queue_depth should be based on the number of physical disks in the array. For
example, for IBM disks, the default value for queue_depth is 3, and the suggested value would be 3 x
number-of-devices. This parameter controls the number of queuable disk requests.
• The disk adapter parameter num_cmd_elems should be set to the sum of queue_depth for all devices
connected to the adapter. This parameter controls the number of requests that can be queued to the
adapter.
Linux configuration
The Db2 database manager automatically updates key Linux kernel parameters to satisfy the
requirements of a wide variety of configurations.
For more information see "Kernel parameter requirements ( Linux )" in Installing Db2 Servers
Instance configuration
When you start a new Db2 instance, there are a number of steps that you can follow to establish a basic
configuration.
• You can use the Configuration Advisor to obtain recommendations for the initial values of the buffer
pool size, database configuration parameters, and database manager configuration parameters. To use
the Configuration Advisor, specify the AUTOCONFIGURE command for an existing database, or specify
the AUTOCONFIGURE parameter on the CREATE DATABASE command. You can display the
recommended values or apply them by using the APPLY parameter on the CREATE DATABASE
command. The recommendations are based on input that you provide and system information that the
advisor gathers.
• Consult the summary tables (see "Configuration parameters summary") that list and briefly describe
each configuration parameter that is available to the database manager or a database. These summary
tables contain a column that indicates whether tuning a particular parameter is likely to produce a high,
medium, low, or no performance change. Use these tables to find the parameters that might help you to
realize the largest performance improvements in your environment.
• Use the ACTIVATE DATABASE command to activate a database and starts up all necessary database
services, so that the database is available for connection and use by any application. In a partitioned
database environment, this command activates the database on all database partitions and avoids the
startup time that is required to initialize the database when the first application connects.
where:
- 0.5 represents the average overhead of one half rotation
- Rotational latency (in milliseconds) is calculated for each full rotation, as follows:
(1 / RPM) * 60 * 1000
where:
• You divide by rotations per minute to get minutes per rotation
• You multiply by 60 seconds per minute
• You multiply by 1000 milliseconds per second
For example, assume that a disk performs 7200 rotations per minute. Using the rotational-latency
formula:
This value can be used to estimate the overhead as follows, assuming an average seek time of 11
milliseconds:
– TRANSFERRATE provides an estimate of the time (in milliseconds) that is required to read one page
of data into memory.
If each table space container is a single physical disk, you can use the following formula to estimate
the transfer cost in milliseconds per page:
where:
Figure 8. Logical table, record, and index structure for standard tables
Logically, index pages are organized as a B-tree that can efficiently locate table records that have a
specific key value. The number of entities on an index page is not fixed, but depends on the size of the
key. For tables in database managed space (DMS) table spaces, record identifiers (RIDs) in the index
pages use table space-relative page numbers, not object-relative page numbers. This enables an index
scan to directly access the data pages without requiring an extent map page (EMP) for mapping.
When a table page is reorganized, embedded free space that is left on the page after a record is physically
deleted is converted to usable free space.
The Db2 data server supports different page sizes. Use larger page sizes for workloads that tend to access
rows sequentially. For example, sequential access is commonly used for decision support applications, or
when temporary tables are being used extensively. Use smaller page sizes for workloads that tend to
access rows randomly. For example, random access is often used in online transaction processing (OLTP)
environments.
The first block contains special internal records, including the free space control record (FSCR), that are
used by the Db2 server to manage the table. In subsequent blocks, the first page contains the FSCR. An
FSCR maps the free space for new records that exists on each page of the block. This available free space
is used when inserting records into the table.
Indexes
Index structure
The database manager uses a B+ tree structure for index storage.
A B+ tree has several levels, as shown in Figure 11 on page 63; "rid" refers to a record ID (RID).
Monitoring AIC
You can monitor AIC with the LIST UTILITIES command. Each index cleaner appears as a separate
utility in the output. The following is an example of output from the LIST UTILITIES SHOW DETAIL
command:
ID = 2
Type = ASYNCHRONOUS INDEX CLEANUP
Database Name = WSDB
Partition Number = 0
Description = Table: USER1.SALES, Index: USER1.I2
Start Time = 12/15/2005 11:15:01.967939
State = Executing
Invocation Type = Automatic
Throttling:
Priority = 50
Progress Monitoring:
Total Work = 5 pages
Completed Work = 0 pages
Start Time = 12/15/2005 11:15:01.979033
ID = 1
Type = ASYNCHRONOUS INDEX CLEANUP
Database Name = WSDB
Partition Number = 0
Description = Table: USER1.SALES, Index: USER1.I1
Start Time = 12/15/2005 11:15:01.978554
State = Executing
Invocation Type = Automatic
Throttling:
Priority = 50
Progress Monitoring:
Total Work = 5 pages
Completed Work = 0 pages
Start Time = 12/15/2005 11:15:01.980524
In this case, there are two cleaners operating on the USERS1.SALES table. One cleaner is processing
index I1, and the other is processing index I2. The progress monitoring section shows the estimated total
number of index pages that need cleaning and the current number of clean index pages.
The State field indicates the current state of a cleaner. The normal state is Executing, but the cleaner
might be in Waiting state if it is waiting to be assigned to an available database agent or if the cleaner is
temporarily suspended because of lock contention.
ID = 2
Type = MDC ROLLOUT INDEX CLEANUP
Database Name = WSDB
Partition Number = 0
Description = TABLE.<schema_name>.<table_name>
Start Time = 06/12/2006 08:56:33.390158
State = Executing
Invocation Type = Automatic
Throttling:
Priority = 50
Progress Monitoring:
Estimated Percentage Complete = 83
Phase Number = 1
Description = <schema_name>.<index_name>
Specifying LASTNAME as an include column rather than part of the index key means that LASTNAME
is stored only on the leaf pages of the index.
– Create relational indexes on columns that are used in the WHERE clauses of frequently run queries.
In the following example, the WHERE clause will likely benefit from an index on WORKDEPT, unless
the WORKDEPT column contains many duplicate values.
– Create relational indexes with a compound key that names each column referenced in a query. When
an index is specified in this way, relational data can be retrieved from the index only, which is more
efficient than accessing the table.
For example, consider the following query:
select lastname
from employee
where workdept in ('A00','D11','D21')
If a relational index is defined on the WORKDEPT and LASTNAME columns of the EMPLOYEE table,
the query might be processed more efficiently by scanning the index rather than the entire table.
Because the predicate references WORKDEPT, this column should be the first key column of the
relational index.
• Searching tables efficiently
Decide between ascending and descending key order, depending on the order that will be used most
often. Although values can be searched in reverse direction if you specify the ALLOW REVERSE SCANS
option on the CREATE INDEX statement, scans in the specified index order perform slightly better than
reverse scans.
• Accessing larger tables efficiently
Use relational indexes to optimize frequent queries against tables with more than a few data pages, as
recorded in the NPAGES column of the SYSCAT.TABLES catalog view. You should:
– Create an index on any column that you will use to join tables.
– Create an index on any column that you will be searching for specific values on a regular basis.
• Improving the performance of update or delete operations
– To improve the performance of such operations against a parent table, create relational indexes on
foreign keys.
– To improve the performance of such operations against REFRESH IMMEDIATE and INCREMENTAL
materialized query tables (MQTs), create unique relational indexes on the implied unique key of the
MQT, which is composed of the columns in the GROUP BY clause of the MQT definition.
• Improving join performance
If you have more than one choice for the first key column in a multiple-column relational index, use the
column that is most often specified with an equijoin predicate (expression1 = expression2) or the
column with the greatest number of distinct values as the first key column.
• Sorting
– For fast sort operations, create relational indexes on columns that are frequently used to sort the
relational data.
– To avoid some sorts, use the CREATE INDEX statement to define primary keys and unique keys
whenever possible.
– Create a relational index to order the rows in whatever sequence is required by a frequently run
query. Ordering is required by the DISTINCT, GROUP BY, and ORDER BY clauses.
The database manager can use an index that is defined on the WORKDEPT column to eliminate
duplicate values. The same index could also be used to group values, as in the following example that
uses a GROUP BY clause:
Using CURRENT MEMBER default value in a Db2 pureScale environment to improve contention issues
In a Db2 pureScale environment, you can set the default value for a column to the CURRENT MEMBER
special register. This member information can then be used to partition a table or an index, and therefore
reduce database contention.
The following scenarios outline some of the situations where creating a new index using a CURRENT
MEMBER column improves database contention issues. Once this new index is created, the Db2
pureScale cluster can make use of the member number information to reduce the amount of active
sharing between members when referencing the table index. This resource reduction can improve the
speed and overall performance of the Db2 pureScale environment.
2. Create (or drop and re-create) the index on the sequence column (seqnumber in this example), adding
the new column to the index:
A similar approach can be taken with database designs where the sequence is a series of timestamp
values. The index for the timestamp column would use the PAGE SPLIT HIGH option, and include the new
CURRENT MEMBER column as well.
2. Create (or drop and re-create) the index on the one or more columns with the few different values (for
example, zipcode and country):
In all these cases, index compression will likely reduce the size of the index on the new CURRENT
MEMBER values.
The Figure 13 on page 76 diagram shows a partitioned index on a partitioned table that spans two
database partitions and resides in a single table space.
The Figure 14 on page 77 diagram shows a mix of partitioned and nonpartitioned indexes on a
partitioned table.
The nonpartitioned index X1 refers to rows in all of the data partitions. By contrast, the partitioned
indexes X2 and X3 refer only to rows in the data partition with which they are associated. Table space TS3
also shows the index partitions sharing the table space of the data partitions with which they are
associated. This configuration is the default for partitioned indexes.
You can override the default location for nonpartitioned and partitioned indexes, although the way that
you do this is different for each. With nonpartitioned indexes, you can specify a table space when you
create the index; for partitioned indexes, you need to determine the table spaces in which the index
partitions are stored when you create the table.
Nonpartitioned indexes
To override the index location for nonpartitioned indexes, use the IN clause on the CREATE INDEX
statement to specify an alternative table space location for the index. You can place different indexes
in different table spaces, as required. If you create a partitioned table without specifying where to
place its nonpartitioned indexes, and you then create an index by using a CREATE INDEX statement
that does not specify a table space, the index is created in the table space of the first attached or
Case 1:
When an index table space is specified in the CREATE INDEX...IN
tbspace statement, use the specified table space for this index.
Case 2:
When an index table space is specified in the CREATE TABLE...
INDEX IN tbspace statement, use the specified
table space for this index.
Case 3:
When no table space is specified, choose the table space that is used
by the first attached or visible data partition.
Partitioned indexes
By default, index partitions are placed in the same table space as the data partitions that they
reference. To override this default behavior, you must use the INDEX IN clause for each data partition
that you define by using the CREATE TABLE statement. In other words, if you plan to use partitioned
indexes for a partitioned table, you must anticipate where you want the index partitions to be stored
when you create the table. If you try to use the INDEX IN clause when creating a partitioned index,
you receive an error message.
Example 1: Given partitioned table SALES (a int, b int, c int), create a unique index A_IDX.
Because the table SALES is partitioned, index a_idx is also created as a partitioned index.
Example 2: Create index B_IDX.
Example 3: To override the default location for the index partitions in a partitioned index, use the INDEX
IN clause for each partition that you define when creating the partitioned table. In the example that
follows, indexes for the table Z are created in table space TS3.
Although the database server does not enforce this correlation, there is an expectation that all keys in the
index will be grouped together by partition IDs to achieve good clustering. For example, suppose that a
Federated databases
Resource utilization
Memory allocation
Memory allocation and deallocation occurs at various times. Memory might be allocated to a particular
memory area when a specific event occurs (for example, when an application connects), or it might be
reallocated in response to a configuration change.
Figure 16 on page 81 shows the different memory areas that the database manager allocates for various
uses and the configuration parameters that enable you to control the size of these memory areas. Note
that in a partitioned database environment, each database partition has its own database manager shared
memory set.
Memory is allocated by the database manager whenever one of the following events occurs:
When the database manager starts (db2start)
Database manager shared memory (also known as instance shared memory) remains allocated until
the database manager stops (db2stop). This area contains information that the database manager
uses to manage activity across all database connections. Db2 automatically controls the size of the
database manager shared memory.
When a database is activated or connected to for the first time
Database global memory is used across all applications that connect to the database. The size of the
database global memory is specified by the database_memory database configuration parameter.
By default, this parameter is set to automatic, allowing Db2 to calculate the initial amount of memory
allocated for the database and to automatically configure the database memory size during run time
based on the needs of the database.
The following memory areas can be dynamically adjusted:
• Buffer pools (using the ALTER BUFFERPOOL statement)
• Database heap (including log buffers)
• Utility heap
• Package cache
• Catalog cache
• Lock list
The sortheap, sheapthres_shr, and sheapthres configuration parameters are also dynamically
updatable. The only restriction is that sheapthres cannot be dynamically changed from 0 to a value
that is greater than zero, or vice versa.
Shared sort operations are performed by default, and the amount of database shared memory that
can be used by sort memory consumers at any one time is determined by the value of the
sheapthres_shr database configuration parameter. Private sort operations are performed only if
intrapartition parallelism, database partitioning, and the connection concentrator are all disabled, and
the sheapthres database manager configuration parameter is set to a non-zero value.
Figure 18. The FCM buffer pool when multiple logical partitions are used
The number of FCM buffers for each database partition is controlled by the fcm_num_buffers database
manager configuration parameter. By default, this parameter is set to automatic. To tune this parameter
manually, use data from the buff_free and buff_free_bottom system monitor elements.
The number of FCM channels for each database partition is controlled by the fcm_num_channels
database manager configuration parameter. By default, this parameter is set to automatic. To tune this
parameter manually, use data from the ch_free and ch_free_bottom system monitor elements.
The Db2 database manager can automatically manage FCM memory resources by allocating more FCM
buffers and channels as needed. This leads to improved performance and prevents "out of FCM resource"
runtime errors. On the Linux operating system, the database manager can preallocate a larger amount of
system memory for FCM buffers and channels, up to a maximum default amount of 4 GB. Memory space
is impacted only when additional FCM buffers or channels are required. To enable this behavior, set the
FCM_MAXIMIZE_SET_SIZE option of the DB2_FCM_SETTINGS registry variable to YES (or TRUE). YES is
the default value.
Starting withDb2 Version 10.5 Fix Pack 5, the following memory-related database configuration
parameters can also be automatically tuned in a Db2 pureScale environment:
• cf_db_mem_sz - CF Database memory
• cf_gbp_sz - Group buffer pool
• cf_lock_sz- CF Lock manager
• cf_sca_sz - Shared communication area
Self-tuning memory
A memory-tuning feature simplifies the task of memory configuration by automatically setting values for
several memory configuration parameters. When enabled, the memory tuner dynamically distributes
available memory resources among the following memory consumers: buffer pools, locking memory,
package cache, and sort memory.
The tuner works within the memory limits that are defined by the database_memory configuration
parameter. The value of this parameter can be automatically tuned as well. When self-tuning is enabled
(when the value of database_memory has been set to AUTOMATIC), the tuner determines the overall
memory requirements for the database and increases or decreases the amount of memory allocated for
database shared memory, depending on current database requirements. For example, if current database
requirements are high and there is sufficient free memory on the system, more memory is allocated for
database shared memory. If the database memory requirements decrease, or if the amount of free
memory on the system becomes too low, some database shared memory is released. If large pages or
pinned memory are enabled, the STMM will not tune the overall database memory configuration, and you
need to assign a specific amount of memory to database memory by setting the DATABASE_MEMORY
configuration parameter to a specific value. For more information, see database_memory,
DB2_LARGE_PAGE_MEM in ../../com.ibm.db2.luw.admin.regvars.doc/doc/r0005665.dita, and ../../
com.ibm.db2.luw.admin.dbobj.doc/doc/t0010405.dita.
When enabled, the memory tuner dynamically distributes available memory resources between several
memory consumers, including buffer pools, locking memory, package cache, and sort memory.
Procedure
1. Enable self-tuning memory for the database by setting the self_tuning_mem database configuration
parameter to ON using the UPDATE DATABASE CONFIGURATION command or the db2CfgSet API.
2. To enable the self tuning of memory areas that are controlled by memory configuration parameters,
set the relevant configuration parameters to AUTOMATIC using the UPDATE DATABASE
CONFIGURATION command or the db2CfgSet API.
3. To enable the self tuning of a buffer pool, set the buffer pool size to AUTOMATIC using the CREATE
BUFFERPOOL statement or the ALTER BUFFERPOOL statement. In a partitioned database
environment, that buffer pool should not have any entries in SYSCAT.BUFFERPOOLDBPARTITIONS.
Results
Note:
1. Because self-tuned memory is distributed between different memory consumers, at least two memory
areas must be concurrently enabled for self tuning at any given time; for example, locking memory and
database shared memory. The memory tuner actively tunes memory on the system (the value of the
Procedure
1. Disable self-tuning memory for the database by setting the self_tuning_mem database
configuration parameter to OFF using the UPDATE DATABASE CONFIGURATION command or the
db2CfgSet API.
2. To disable the self tuning of memory areas that are controlled by memory configuration parameters,
set the relevant configuration parameters to MANUAL or specify numeric parameter values using the
UPDATE DATABASE CONFIGURATION command or the db2CfgSet API.
3. To disable the self tuning of a buffer pool, set the buffer pool size to a specific value using the ALTER
BUFFERPOOL statement.
Results
Note:
• In some cases, a memory configuration parameter can be enabled for self tuning only if another related
memory configuration parameter is also enabled. This means that, for example, disabling self-tuning
memory for the locklist or the sortheap database configuration parameter disables self-tuning
memory for the maxlocks or the sheapthres_shr database configuration parameter, respectively.
Procedure
• To view the settings for configuration parameters, use one of the following methods:
• Use the GET DATABASE CONFIGURATION command, specifying the SHOW DETAIL parameter.
The memory consumers that can be enabled for self tuning are grouped together in the output as
follows:
------------------------------------------------------------------------------------------
--
Self tuning memory (SELF_TUNING_MEM) = ON (Active) ON
Size of database shared memory (4KB) (DATABASE_MEMORY) = AUTOMATIC(37200)
AUTOMATIC(37200)
Max storage for lock list (4KB) (LOCKLIST) = AUTOMATIC(7456)
AUTOMATIC(7456)
Percent. of lock lists per application (MAXLOCKS) = AUTOMATIC(98)
AUTOMATIC(98)
Package cache size (4KB) (PCKCACHESZ) = AUTOMATIC(5600)
AUTOMATIC(5600)
Sort heap thres for shared sorts (4KB) (SHEAPTHRES_SHR) = AUTOMATIC(5000)
AUTOMATIC(5000)
Sort list heap (4KB) (SORTHEAP) = AUTOMATIC(256)
AUTOMATIC(256)
SQLF_OFF 0
SQLF_ON_ACTIVE 2
SQLF_ON_INACTIVE 3
SQLF_ON_ACTIVE indicates that self tuning is both enabled and active, whereas
SQLF_ON_INACTIVE indicates that self tuning is enabled but currently inactive.
• To view the self-tuning settings for buffer pools, use one of the following methods:
• To retrieve a list of the buffer pools that are enabled for self tuning from the command line, use the
following query:
When self tuning is enabled for a buffer pool, the NPAGES field in the SYSCAT.BUFFERPOOLS view
for that particular buffer pool is set to -2. When self tuning is disabled, the NPAGES field is set to
the current size of the buffer pool.
• To determine the current size of buffer pools that are enabled for self tuning, use the GET
SNAPSHOT command and examine the current size of the buffer pools (the value of the
bp_cur_buffsz monitor element):
An ALTER BUFFERPOOL statement that specifies the size of a buffer pool on a particular database
partition creates an exception entry (or updates an existing entry) for that buffer pool in the
SYSCAT.BUFFERPOOLDBPARTITIONS catalog view. If an exception entry for a buffer pool exists,
Here is a summary of the members on which the STMM tuner is active based on the value in the SYSCAT
table.
Note that when the tuning member changes, some data collected from the member which was running
the tuner, is discarded. This data must be recollected on the new tuning member. During this short period
of time when the data is being recollected, the memory tuner will still tune the system; however, the
tuning can occur slightly differently than it did on the original member.
Starting the memory tuner in a Db2 pureScale environment
In a Db2 pureScale environment, the memory tuner will run whenever the database is active on one or
more members that have self_tuning_mem set to ON.
Disabling self-tuning memory for a specific member
• To disable self-tuning memory for a subset of database members, set the self_tuning_mem database
configuration parameter to OFF for those members.
• To disable self-tuning memory for a subset of the memory consumers that are controlled by
configuration parameters on a specific member, set the value of the relevant configuration parameter to
db2set DB2_DATABASE_CF_MEMORY=AUTO
Setting DB2_DATABASE_CF_MEMORY to AUTO turns on CF self-tuning memory for all databases that have
the CF memory consumer parameters (cf_gbp_sz, cf_lock_sz, cf_sca_sz) set to AUTOMATIC.
However, for databases that have the CF memory consumer parameters set to fixed values, CF self-tuning
memory remains off.
Note:
1. In Db2 Version 10.5 Fix Pack 5 and later fix packs, if you are applying an online fix pack, you cannot set
registry variable DB2_DATABASE_CF_MEMORY until after the instance is committed to the new fix pack
level.
2. High availability disaster recovery (HADR) in a Db2 pureScale environment can use CF self-tuning
memory. However, CF memory tuning occurs on the primary site only. If the registry variable is set on
the standby site, the registry variable takes effect when the standby site becomes the primary site.
CF self-tuning memory can be used at different levels:
CF self-tuning memory = ON
CF database memory size (4KB) (CF_DB_MEM_SZ) = AUTOMATIC(8383744) AUTOMATIC(8383744)
Group buffer pool size (4KB) (CF_GBP_SZ) = AUTOMATIC(6589696) AUTOMATIC(6589696)
Global lock memory size (4KB) (CF_LOCK_SZ) = AUTOMATIC(1257728) AUTOMATIC(1257728)
Shared communication area size (4KB) (CF_SCA_SZ) = AUTOMATIC(135245) AUTOMATIC(135245)
Smart array size (4KB) (CF_LIST_SZ) = AUTOMATIC(315571) AUTOMATIC(315571)
When explicitly disabled by changing the registry variable DB2_DATABASE_CF_MEMORY from AUTO to a
numeric value, CF memory tuning is turned off for all databases that are activated. However, the CF
memory usage by the database and the CF memory consumer parameters is not adjusted immediately for
these databases. The current CF memory sizes remain at the current level. A database manager restart is
required.
CF self-tuning memory in a multiple database environment with cf_db_mem_sz set to AUTOMATIC
If you are running in an environment with one database with database manager configuration parameter
numdb set to 2 (or higher), when CF self-tuning memory is turned on, that database can use almost all CF
memory. (Some memory is reserved for additional database activation.) Later when another database is
added, after that database is started, the already active database automatically gives up CF memory for
the newly activated database until needed, or until a workload-based distribution of CF memory is
reached.
For example:
• DB2_DATABASE_CF_MEMORY is set to AUTO
• numdb is set to 3, indicating a maximum number of three active databases in the instance
• cf_db_mem_sz is set to AUTOMATIC
Collect the following data by issuing the command: get snapshot for bufferpools on <dbname>
The tuning partition is updated asynchronously or at the next database startup. To have the memory
tuner automatically select the tuning partition, enter -1 for the partitionnum value.
Page-cleaner agents
In a well-tuned system, it is usually the page-cleaner agents that write changed or dirty pages to disk.
Page-cleaner agents perform I/O as background processes and allow applications to run faster because
their agents can perform actual transaction work. Page-cleaner agents are sometimes referred to as
asynchronous page cleaners or asynchronous buffer writers, because they are not coordinated with the
work of other agents and work only when required.
To improve performance for update-intensive workloads, you might want to enable proactive page
cleaning, whereby page cleaners behave more proactively in choosing which dirty pages get written out at
any given point in time. This is particularly true if snapshots reveal that there are a significant number of
synchronous data-page or index-page writes in relation to the number of asynchronous data-page or
index-page writes.
Figure 19 on page 101 illustrates how the work of managing the buffer pool can be shared between page-
cleaner agents and database agents.
Sequential prefetching
Reading several consecutive pages into the buffer pool using a single I/O operation can greatly reduce
your application overhead.
Prefetching starts when the database manager determines that sequential I/O is appropriate and that
prefetching might improve performance. In cases such as table scans and table sorts, the database
manager chooses the appropriate type of prefetching. The following example, which probably requires a
table scan, would be a good candidate for sequential prefetching:
With smart data and smart index prefetching, both sequential and readahead prefetching is enabled,
which is the default. Sequential detection prefetching is initially used until a threshold of non-prefetched
pages is reached or, in some cases, if the MAXPAGES estimate made by the optimizer is exceeded. When
the threshold of non-prefetched pages is reached or if the MAXPAGES estimate is exceeded, readahead
prefetching is enabled.
Sequential detection
Sometimes, it is not immediately apparent that sequential prefetching will improve performance. In such
cases, the database manager can monitor I/O and activate prefetching if sequential page reading is
occurring. This type of sequential prefetching, known as sequential detection, applies to both index and
data pages. Use the seqdetect database configuration parameter to control whether the database
manager performs sequential detection or readahead prefetching.
For example, if sequential detection is enabled, the following SQL statement might benefit from
sequential prefetching:
In this example, the optimizer might have started to scan the table using an index on the EMPNO column.
If the table is highly clustered with respect to this index, the data page reads will be almost sequential,
and prefetching might improve performance. Similarly, if many index pages must be examined, and the
database manager detects that sequential page reading of the index pages is occurring, index page
prefetching is likely.
Readahead prefetching
Readahead prefetching looks ahead in the index to determine the exact data pages and index leaf pages
that ISCAN-FETCH and index scan operations will access, and prefetches them.
While readahead prefetching provides all the data and index pages needed during the index scan (and no
pages that are not needed), it also requires additional resources to locate those pages. For highly
sequential data and indexes, sequential detection prefetching will normally out-perform readahead
prefetching.
With smart data and smart index prefetching, both sequential and readahead prefetching is enabled,
which is the default. Sequential detection prefetching is initially used until a threshold of non-prefetched
Restrictions
If index predicates must be evaluated during an index scan, and the optimizer determines that the index
predicates for a particular index scan have a compound selectivity rate less than 90% (not many rows
qualify), data readahead prefetching is disabled for that index scan. Note that this is a compound
selectivity taking into account all index predicates for that particular index scan. If the query optimizer
enables readahead prefetching for lower predicate selectivities it might cause many unnecessary pages
to be prefetched.
Data readahead prefetching is also disabled while scanning a non-partitioned index on a range-
partitioned table to prevent a prefetch request from containing page references from multiple partitions.
For smart data and smart index prefetching, which can use readahead prefetching, these prefetching
techniques apply only to index scan operations and do not support XML, extended, and Text Search text
indexes.
List prefetching
List prefetching (or list sequential prefetching) is a way to access data pages efficiently, even when those
pages are not contiguous.
List prefetching can be used in conjunction with either single or multiple index access.
If the optimizer uses an index to access rows, it can defer reading the data pages until all of the row
identifiers (RIDs) have been obtained from the index. For example, the optimizer could perform an index
scan to determine the rows and data pages to retrieve.
If the data is not clustered according to this index, list prefetching includes a step that sorts the list of
RIDs that were obtained from the index scan.
1
The user application passes the request to the database agent that has been assigned to the user
application by the database manager.
2, 3
The database agent determines that prefetching should be used to obtain the data that is required to
satisfy the request, and writes a prefetch request to the I/O server queue.
4, 5
The first available I/O server reads the prefetch request from the queue and then reads the data from
the table space into the buffer pool. The number of I/O servers that can simultaneously fetch data
from a table space depends on the number of prefetch requests in the queue and the number of I/O
servers specified by the num_ioservers database configuration parameter.
6
The database agent performs the necessary operations on the data pages in the buffer pool and
returns the result to the user application.
Procedure
To configure IOCP:
1. To check whether the IOCP module is installed on your system, enter the following command:
$ lslpp -l bos.iocp.rte
Path: /etc/objrepos
bos.iocp.rte 5.3.0.50 COMMITTED I/O Completion Ports API
2. Check whether the status of the IOCP port is Available by entering the following command:
# smitty iocp
Data organization
Over time, data in your tables can become fragmented, increasing the size of tables and indexes as
records become distributed over more and more data pages. This can increase the number of pages that
Procedure
The steps to perform an index or table reorganization are as follows:
1. Determine whether you need to reorganize any tables or indexes.
2. Choose a reorganization method.
3. Perform the reorganization of identified objects.
4. Optional: Monitor the progress of reorganization.
5. Determine whether or not the reorganization was successful.
For offline table reorganization and any index reorganization, the operation is synchronous, and the
outcome is apparent upon completion of the operation. For online table reorganization, the operation
is asynchronous, and details are available from the history file.
6. Collect statistics on reorganized objects.
7. Rebind applications that access reorganized objects.
Table reorganization
After many changes to table data, logically sequential data might reside on nonsequential data pages, so
that the database manager might need to perform additional read operations to access data. Also, if many
rows have been deleted, additional read operations are also required. In this case, you might consider
reorganizing the table to match the index and to reclaim space.
You can also reorganize the system catalog tables.
Because reorganizing a table usually takes more time than updating statistics, you could execute the
RUNSTATS command to refresh the current statistics for your data, and then rebind your applications. If
refreshed statistics do not improve performance, reorganization might help.
The following factors can indicate a need for table reorganization:
• There has been a high volume of insert, update, and delete activity against tables that are accessed by
queries.
• There have been significant changes in the performance of queries that use an index with a high cluster
ratio.
• Executing the RUNSTATS command to refresh table statistics does not improve performance.
• Output from the REORGCHK command indicates a need for table reorganization.
Note: With Db2 V9.7 Fix Pack 1 and later releases, higher data availability for a data partitioned table with
only partitioned indexes (except system-generated XML path indexes) is achieved by reorganizing data for
a specific data partition. Partition-level reorganization performs a table reorganization on a specified data
partition while the remaining data partitions of the table remain accessible. The output from the
REORGCHK command for a partitioned table contains statistics and recommendations for performing
partition-level reorganizations.
REORG TABLE commands and REORG INDEXES ALL commands can be issued on a data partitioned
table to concurrently reorganize different data partitions or partitioned indexes on a partition. When
concurrently reorganizing data partitions or the partitioned indexes on a partition, users can access the
unaffected partitions but cannot access the affected partitions. All the following criteria must be met to
issue REORG commands that operate concurrently on the same table:
• Each REORG command must specify a different partition with the ON DATA PARTITION clause.
• Each REORG command must use the ALLOW NO ACCESS mode to restrict access to the data partitions.
• The partitioned table must have only partitioned indexes if issuing REORG TABLE commands. No
nonpartitioned indexes (except system-generated XML path indexes) can be defined on the table.
In IBM Data Studio Version 3.1 or later, you can use the task assistant for reorganizing tables. Task
assistants can guide you through the process of setting options, reviewing the automatically generated
Table 5. Table types that are supported for online and offline reorganization
Support offline Support online
Table type reorganization reorganization
Multidimensional clustering tables Yes1 Yes8
(MDC)
Insert time clustering tables (ITC) Yes1, 7 Yes6, 7
Range-clustered tables (RCT) No2 No
Append mode tables Yes No3
Tables with long field or large object Yes5 Yes5
(LOB) data
Notes:
1. Because clustering is automatically maintained through MDC block indexes, reorganization of an
MDC table involves space reclamation only. No indexes can be specified. Similarly, for ITC tables, you
cannot specify a reorganization with a clustering index.
2. The range area of an RCT always remains clustered.
3. Online reorganization can be run after append mode is disabled.
4. Reorganizing long field or large object (LOB) data can take a significant amount of time, and does not
improve query performance. Reorganization is done only for space reclamation purposes.
5. Online table reorganization does not reorganize the LONG/LOB data, but reorganizes the other
columns.
6. Online reorganization of an ITC table is supported. The reorganization is done with the existing
RECLAIM EXTENTS table clause of the REORG command.
7. The RECLAIM EXTENTS table clause of the REORG command consolidates sparse extents implicitly.
This consolidation leads to more space reclamation, but a longer duration for utility execution when
compared to Db2 Version 10.1.
8. Not supported when RECLAIM EXTENTS is used.
Note:
You can use the online table move stored procedure as an alternative approach to INPLACE
reorganization. See "Moving tables online by using the ADMIN_MOVE_TABLE procedure".
For tables with XML columns, classic table reorganization may take a lot of time to finish because system
generated XML indexes will need to be re-built. This requires traversal of every document in the table.
During this time, the table cannot be accessed. Therefore, in place table reorganization is generally
recommended because the system generated XML indexes will not need to be re-built and the table
remains accessible during the reorganization (except during the truncation phase).
Procedure
1. To reorganize a table using the REORG TABLE command, simply specify the name of the table. For
example:
You can reorganize a table using a specific temporary table space. For example:
You can reorganize a table and have the rows reordered according to a specific index. For example:
2. To reorganize a table using an SQL CALL statement, specify the REORG TABLE command with the
ADMIN_CMD procedure. For example:
3. To reorganize a table using the administrative application programming interface, call the db2Reorg
API.
What to do next
After reorganizing a table, collect statistics on that table so that the optimizer has the most accurate data
for evaluating query access plans.
Procedure
• To reorganize a table online using the REORG TABLE command, specify the name of the table and the
INPLACE parameter.
For example:
• To reorganize a table online using the administrative application programming interface, call the
db2Reorg API.
What to do next
After reorganizing a table, collect statistics on that table so that the optimizer has the most accurate data
for evaluating query access plans.
Procedure
1. To pause an online table reorganization using the REORG TABLE command, specify the name of the
table, the INPLACE parameter, and the PAUSE parameter.
For example:
When an online table reorg operation is paused, you cannot begin a new reorganization of that table.
You must either resume or stop the paused operation before beginning a new reorganization process.
Following a RESUME request, the reorganization process respects whatever truncation option is
specified on the current RESUME request. For example, if the NOTRUNCATE parameter is not specified
Procedure
• To access information about reorganization operations using SQL, use the SNAPTAB_REORG
administrative view.
For example, the following query returns details about table reorganization operations on all database
partitions for the currently connected database. If no tables have been reorganized, no rows are
returned.
select
substr(tabname, 1, 15) as tab_name,
substr(tabschema, 1, 15) as tab_schema,
reorg_phase,
substr(reorg_type, 1, 20) as reorg_type,
reorg_status,
reorg_completion,
dbpartitionnum
• To access information about reorganization operations using the snapshot monitor, use the GET
SNAPSHOT FOR TABLES command and examine the values of the table reorganization monitor
elements.
Results
Because offline table reorg operations are synchronous, errors are returned to the caller of the utility (an
application or the command line processor). And because online table reorg operations are
asynchronous, error messages in this case are not returned to the CLP. To view SQL error messages that
are returned during an online table reorg operation, use the LIST HISTORY REORG command.
An online table reorg operation runs in the background as the db2Reorg process. This process continues
running even if the calling application terminates its database connection.
Index reorganization
As tables are updated, index performance can degrade.
The degradation can occur in the following ways:
• Leaf pages become fragmented. When leaf pages are fragmented, I/O costs increase because more leaf
pages must be read to fetch table pages.
• The physical index page order no longer matches the sequence of keys on those pages, resulting in low
density indexes. When leaf pages have a low density, sequential prefetching is inefficient and the
number of I/O waits increases. However, if smart index prefetching is enabled, the query optimizer
switches to readahead prefetching if low density indexes exist. This action helps reduce the negative
impact that low density indexes have on performance.
• The index develops too many levels. In this case, the index might be reorganized.
Index reorganization requires:
• SYSADM, SYSMAINT, SYSCTRL, DBADM, or SQLADM authority, or CONTROL privilege on the table and
its indexes
• When the REBUILD option with the ALLOW READ or WRITE ACCESS options are chosen, an amount of
free space in the table space where the indexes are stored is required. This space must be equal to the
current size of the indexes. Consider placing indexes in a large table space when you issue the CREATE
TABLE statement.
• Additional log space. The index REORG utility logs its activities.
If you specify the MINPCTUSED option on the CREATE INDEX statement, the database server
automatically merges index leaf pages if a key is deleted and the free space becomes less than the
specified value. This process is called online index defragmentation.
To restore index clustering, free up space, and reduce leaf levels, you can use one of the following
methods:
• Drop and re-create the index.
• Use the REORG TABLE command with options that reorganize the table and rebuild its indexes offline.
• Use the REORG INDEXES command with the REBUILD option to reorganize indexes online or offline.
You might choose online reorganization in a production environment. Online reorganization allows users
to read from or write to the table while its indexes are being rebuilt.
If your primary objective is to free up space, consider the CLEANUP and RECLAIM EXTENTS options of
the REORG command. See the related links for more details.
In IBM Data Studio Version 3.1 or later, you can use the task assistant for reorganizing indexes. Task
assistants can guide you through the process of setting options, reviewing the automatically generated
commands to perform the task, and running these commands. For more details, see Administering
databases with task assistants.
Procedure
Issue the db2pd command with the -reorgs index parameter:
Results
The following is an example of output obtained using the db2pd command with the-reorgs index
parameter, which reports the index reorganization progress for a range-partitioned table with two
partitions.
Procedure
To enable your database for automatic reorganization:
Example
For an example of this capability, see the Related concepts.
Procedure
To enable automatic index reorganization in volatile tables, perform the following steps:
1. Set the DB2_WORKLOAD registry variable to SAP. The following example shows how to set this variable
using the db2set command:
db2set DB2_WORKLOAD=SAP
Warning: Setting the DB2_WORKLOAD registry variable to SAP will enable other registry
variables. You can check what these other variables are by running the following command:
2. Set the auto_reorg database configuration parameter to ON. The following example shows how to
set this database configuration parameter using the Db2 CLP command line interface:
Ensure that the auto_maint and auto_tbl_maint database configuration parameters are also set
to ON. By the default, auto_maint and auto_tbl_maint are set to ON.
3. Set the numInxPseudoEmptyPagesForVolatileTables attribute in the AUTO_REORG policy by calling
the AUTOMAINT_SET_POLICY or AUTOMAINT_SET_POLICYFILE procedure. This attribute indicates
the minimum number of empty index pages with pseudo deleted keys required to perform the index
reorganization. The following example shows how to set this attribute:
<ReorgTableScope maxOfflineReorgTableSize="0">
<FilterClause>TABSCHEMA NOT LIKE 'SYS%'</FilterClause>
</ReorgTableScope>
</DB2AutoReorgPolicy>')
)
You can monitor the values for the PSEUDO_EMPTY_PAGES, EMPTY_PAGES_DELETED, and
EMPTY_PAGES_REUSED column by querying the MON_GET_INDEX table function to help you
determine an appropriate value for the numInxPseudoEmptyPagesForVolatileTables attribute.
Note: Creating a public alias will fail with SQL1599N.
Scenario: ExampleBANK reclaiming table and index space - Space management policies
The database administrator at ExampleBANK, Olivia, struggled for years to effectively manage the size of
databases.
During the normal course of business operation, batch deletes are done on tables to get rid of data that is
no longer required. The tables and associated indexes have free unused space that cannot be used by any
other object in the same table space after the batch deletes are complete. ExampleBank has a space
management policy in place to free this unused space. Each month Olivia takes the affected databases
offline so the tables and indexes can be reorganized. The reorganization of the objects frees the space. To
minimize downtime, the work is done during off-peak hours.
This table and index space management policy takes time and manual intervention. Also, because Olivia
takes the database offline to complete this task, the affected tables and indexes are not available to users
during the reorganization.
Olivia is told about new command and statement parameters to reclaim space from tables and indexes. A
new way to manage the space needed for tables and indexes is presented.
Scenario: ExampleBANK reclaiming table and index space - Creating an insert time clustering table
Insert time clustering (ITC) tables can help Olivia, and ExampleBANK, manage database size more
effectively without manual intervention or database downtime.
Olivia creates an insert time clustering table as a test. The ORGANIZE BY INSERT TIME clause ensures
that the table is created as an ITC table:
Scenario: ExampleBANK reclaiming table and index space - Evaluating the effectiveness of reclaiming
space from a table
Time passes and normal operations are run on the ITC table.
At some point a batch delete is run on this table, and large portions of the object become empty. Olivia
wants to make this trapped space available to other objects in TABLESPACE1. Olivia can evaluate the
SPARSE_BLOCKS RECLAIMABLE_SPACE
-------------------- -----------------
7834 14826781647
1 record(s) selected.
Olivia notices that there are a significant number of blocks in this table that are sparsely populated with
data. A significant amount of space is available to be reclaimed. By running a reorganization on this table,
Olivia consolidates the remaining data in these blocks into a smaller group of blocks. A reorganization
also releases any fully empty blocks back to the table space. Using the REORG command, Olivia then
releases the reclaimable space now empty after the batch delete process back to the system:
Note: The table remains fully available to all users while the REORG command is processed.
Olivia then repeats the command to determine how much space was released to the table space:
SPARSE_BLOCKS RECLAIMABLE_SPACE
-------------------- -----------------
1 30433
1 record(s) selected.
The result is 14,826,751,224 KB of space that is formerly occupied by data is reclaimed. Since the
RECLAIM EXTENTS operation is an online operation, Olivia notes that the sparse blocks and reclaimable
space are not zero when the operation is complete. Other activity on the table occurred while the
RECLAIM EXTENTS operation completed.
Scenario: ExampleBANK reclaiming table and index space - Evaluating the effectiveness of reclaiming
space from an index
Olivia notes the space reclaimed from the data portion of the table T1 after a batch delete. Olivia knows
that there is some cleanup left to be done in the indexes for this table.
Reclaiming index space can recover space occupied by indexes while the same indexes are still available
for use by users.
As with reclaiming space from the table, the space in question is reclaimed back to the table space for
reuse by other objects.
Olivia uses the ADMIN_GET_INDEX_INFO function to see how much space can be reclaimed:
1 record(s) selected.
REORG INDEXES ALL FOR TABLE T1 ALLOW WRITE ACCESS CLEANUP ALL RECLAIM EXTENTS
Olivia then repeats the command to determine how much space was released to the table space:
1 record(s) selected.
The result is an estimated 846,592 KB of space reclaimed. If the physical size after space is reclaimed is
subtracted from the original physical size, Olivia notes that the actual space reclaimed is 846,976 KB.
Scenario: ExampleBANK reclaiming table and index space - Converting an existing table to an insert
time clustering table
Olivia sees the benefit of using insert time clustering tables. Olivia now wants to use this solution on
existing tables in the production database. This change is accomplished by using the online table move
utility.
Olivia has a table that exists on a system with the following schema. In this scenario, assume that the
table actually has a column which is useful for placing data in approximate insert time order (C4).
The schema is identical to the original table but by using the ORGANIZE BY INSERT TIME keywords,
Olivia ensures that this table is clustered by time.
Olivia uses the online table move stored procedure to perform the conversion.
Since a clustering index exists on column C4, it gives Olivia a good approximation of insert time ordering.
For tables that do not have such a column, the space reclamation benefits of moving to an insert time
clustering table is not apparent for some time. This benefit is not immediately apparent because newer
data is grouped together with older data.
EXMP.T1 is now in a time clustering table format. It is ready to have extents reclaimed after subsequent
batch deletions.
Scenario: ExampleBANK reclaiming table and index space - Improving row overflow performance
During normal SQL processing, a row value update might result in that row no longer fitting in the original
location in the database. When this scenario occurs, the database manager splits the row into two pieces.
In the original location, the row is replaced with a pointer. The location that the new pointer indicates is
where the larger and new copy of the row can be found. Any subsequent access to the updated row now
follows this pointer, causing performance degradation.
TABNAME OVERFLOW_ACCESSES
--------------------------------------------------- --------------------
T1 172
1 record(s) selected.
Olivia notes that T1 might benefit from a CLEANUP reorganization to reduce the number of overflow
accesses. Olivia uses the following command on each table:
Olivia can then rerun the original monitor command after this reorganization operation. Olivia notices the
number of new pointer or overflow accesses is reduced to 0.
TABNAME OVERFLOW_ACCESSES
--------------------------------------------------- --------------------
T1 0
1 record(s) selected.
Scenario: ExampleBANK reclaiming table and index space - General index maintenance
Olivia notes that, for some indexes and tables, space consumption and behavior are not closely tracked.
Periodically, a script can check whether any space in the affected table spaces can be cleaned up and
reclaimed.
Olivia uses the REORGCHK command to determine whether an index cleanup would be beneficial:
Table statistics:
Index statistics:
31 0 ---*-
------------
Tables defined using the ORGANIZE BY clause and the corresponding dimension
indexes have a '*' suffix to their names. The cardinality of a dimension index
is equal to the Active blocks statistic of the table.
Formula F7 in the output shows that, for the TBL1_INX1 index, an index cleanup using the REORG
command would be beneficial. Olivia issues the command to clean up the indexes:
REORG INDEXES ALL FOR TABLE USER1.TBL1 ALLOW WRITE ACCESS CLEANUP;
To determine how much space can be reclaimed now that the REORG INDEXES CLEANUP command
freed up space, Olivia uses the ADMIN_GET_INDEX_INFO routine:
SELECT RECLAIMABLE_SPACE
FROM TABLE(sysproc.admin_get_index_info('T','USER1', 'TBL1'))
AS t";
RECLAIMABLE_SPACE
------------------
14736
1 record(s) selected.
If Olivia considers this value, in KB, to be significant, she can run the REORG INDEX RECLAIM EXTENTS
command:
REORG INDEXES ALL FOR TABLE USER1.TBL1 ALLOW WRITE ACCESS RECLAIM EXTENTS;
Olivia can schedule this work at regular intervals to ensure that the indexes in question do not hold more
space than required. This regularly scheduled work does not prohibit others from using the indexes in
question.
Olivia decides that the reclaimExtentsSizeForIndexObjects threshold must exceed 51,200 KB (50 MB)
before any automatic reorganization with the RECLAIM EXTENTS option is run. Olivia copies
DB2AutoReorgPolicySample.xml to a file called autoreorg_policy.xml and changes the line in the sample
to the following value:
cp $HOME/autoreorg_policy.xml $HOME/sqllib/tmp/.
db2 "call sysproc.automaint_set_policyfile( 'AUTO_REORG', 'autoreorg_policy.xml')"
Application design
Database application design is one of the factors that affect application performance. Review this section
for details about application design considerations that can help you to maximize the performance of
database applications.
Concurrency issues
Because many users access and change data in a relational database, the database manager must allow
users to make these changes while ensuring that data integrity is preserved.
Concurrency refers to the sharing of resources by multiple interactive users or application programs at the
same time. The database manager controls this access to prevent undesirable effects, such as:
• Lost updates. Two applications, A and B, might both read the same row and calculate new values for
one of the columns based on the data that these applications read. If A updates the row and then B also
updates the row, A's update lost.
• Access to uncommitted data. Application A might update a value, and B might read that value before it
is committed. Then, if A backs out of that update, the calculations performed by B might be based on
invalid data.
• Non-repeatable reads. Application A might read a row before processing other requests. In the
meantime, B modifies or deletes the row and commits the change. Later, if A attempts to read the
original row again, it sees the modified row or discovers that the original row has been deleted.
Isolation levels
The isolation level that is associated with an application process determines the degree to which the data
that is being accessed by that process is locked or isolated from other concurrently executing processes.
The isolation level is in effect for the duration of a unit of work.
The isolation level of an application process therefore specifies:
• The degree to which rows that are read or updated by the application are available to other concurrently
executing application processes
• The degree to which the update activity of other concurrently executing application processes can
affect the application
The isolation level for static SQL statements is specified as an attribute of a package and applies to the
application processes that use that package. The isolation level is specified during the program
preparation process by setting the ISOLATION bind or precompile option. For dynamic SQL statements,
the default isolation level is the isolation level that was specified for the package preparing the statement.
Use the SET CURRENT ISOLATION statement to specify a different isolation level for dynamic SQL
statements that are issued within a session. For more information, see "CURRENT ISOLATION special
register". For both static SQL statements and dynamic SQL statements, the isolation-clause in a select-
statement overrides both the special register (if set) and the bind option value. For more information, see
"Select-statement".
Isolation levels are enforced by locks, and the type of lock that is used limits or prevents access to the
data by concurrent application processes. Declared temporary tables and their rows cannot be locked
because they are only accessible to the application that declared them.
The database manager supports three general categories of locks:
Share (S)
Under an S lock, concurrent application processes are limited to read-only operations on the data.
Update (U)
Under a U lock, concurrent application processes are limited to read-only operations on the data, if
these processes have not declared that they might update a row. The database manager assumes
that the process currently looking at a row might update it.
Note:
1. An example of the phantom read phenomenon is as follows: Unit of work UW1 reads the set of n rows
that satisfies some search condition. Unit of work UW2 inserts one or more rows that satisfy the
same search condition and then commits. If UW1 subsequently repeats its read with the same
search condition, it sees a different result set: the rows that were read originally plus the rows that
were inserted by UW2.
2. If your label-based access control (LBAC) credentials change between reads, results for the second
read might be different because you have access to different rows.
3. The isolation level offers no protection to the application if the application is both reading from and
writing to a table. For example, an application opens a cursor on a table and then performs an insert,
update, or delete operation on the same table. The application might see inconsistent data when
more rows are fetched from the open cursor.
4. An example of the non-repeatable read phenomenon is as follows: Unit of work UW1 reads a row.
Unit of work UW2 modifies that row and commits. If UW1 subsequently reads that row again, it might
see a different value.
5. An example of the dirty read phenomenon is as follows: Unit of work UW1 modifies a row. Unit of
work UW2 reads that row before UW1 commits. If UW1 subsequently rolls the changes back, UW2
has read nonexisting data.
6. Under UR or CS, if the cursor is not updatable, the current row can be updated or deleted by other
application processes in some cases. For example, buffering might cause the current row at the client
to be different from the current row at the server. Moreover, when using currently committed
semantics under CS, a row that is being read might have uncommitted updates pending. In this case,
the currently committed version of the row is always returned to the application.
The isolation level affects not only the degree of isolation among applications but also the performance
characteristics of an individual application, because the processing and memory resources that are
required to obtain and free locks vary with the isolation level. The potential for deadlocks also varies with
the isolation level. Table 8 on page 149 provides a simple heuristic to help you choose an initial isolation
level for your application.
Table 8. Guidelines for choosing an isolation level
Application type High data stability required High data stability not required
Read-write transactions RS CS
Read-only transactions RR or RS UR
Procedure
• At the statement or subselect level:
Note: Isolation levels for XQuery statements cannot be specified at the statement level.
Use the WITH clause. The WITH UR option applies to read-only operations only. In other cases, the
statement is automatically changed from UR to CS.
This isolation level overrides the isolation level that is specified for the package in which the statement
appears. You can specify an isolation level for the following SQL statements:
– DECLARE CURSOR
– Searched DELETE
– INSERT
– SELECT
– SELECT INTO
– Searched UPDATE
• For dynamic SQL within the current session:
Use the SET CURRENT ISOLATION statement to set the isolation level for dynamic SQL issued within a
session. Issuing this statement sets the CURRENT ISOLATION special register to a value that specifies
the isolation level for any dynamic SQL statements that are issued within the current session. Once
set, the CURRENT ISOLATION special register provides the isolation level for any subsequent dynamic
SQL statement that is compiled within the session, regardless of which package issued the statement.
where pkgname is the unqualified name of the package and pkgschema is the schema name of the
package. Both of these names must be specified in uppercase characters.
• When working with JDBC or SQLJ at run time:
Note: JDBC and SQLJ are implemented with CLI on Db2 servers, which means that the db2cli.ini
settings might affect what is written and run using JDBC and SQLJ.
To create a package (and specify its isolation level) in SQLJ, use the SQLJ profile customizer
(db2sqljcustomize command).
• From CLI or ODBC at run time:
Use the CHANGE ISOLATION LEVEL command. With Db2 Call-level Interface (CLI), you can change
the isolation level as part of the CLI configuration. At run time, use the SQLSetConnectAttr function
with the SQL_ATTR_TXN_ISOLATION attribute to set the transaction isolation level for the current
connection referenced by the ConnectionHandle argument. You can also use the TXNISOLATION
keyword in the db2cli.ini file.
• On database servers that support REXX:
When a database is created, multiple bind files that support the different isolation levels for SQL in
REXX are bound to the database. Other command line processor (CLP) packages are also bound to the
database when a database is created.
REXX and the CLP connect to a database using the default CS isolation level. Changing this isolation
level does not change the connection state.
To determine the isolation level that is being used by a REXX application, check the value of the
SQLISL predefined REXX variable. The value is updated each time that the CHANGE ISOLATION
LEVEL command executes.
• Changing the default isolation level used for new sessions:
The normal default isolation level used for dynamic SQL within a new session is determined by the
isolation level of the package being used in that session. While the application can change this value
during its processing, the database administrator can also change the default isolation level outside of
the application by either implementing:
– a customized CONNECT procedure (Customizing an application environment using the connect
procedure)
– setting the DB2_DEFAULT_ISOLATION_VALUE registry variable (General registry variables) and, if
desired, the DB2_BYPASS_DEFAULT_ISOLATION_APPS,
DB2_BYPASS_DEFAULT_ISOLATION_GROUPS, or DB2_BYPASS_DEFAULT_ISOLATION_USERS
registry variable.
Restrictions
The following restrictions apply to currently committed semantics:
• The target table object in a section that is to be used for data update or deletion operations does not
use currently committed semantics. Rows that are to be modified must be lock protected to ensure that
they do not change after they have satisfied any query predicates that are part of the update operation.
• A transaction that makes an uncommitted modification to a row forces the currently committed reader
to access appropriate log records to determine the currently committed version of the row. Although log
records that are no longer in the log buffer can be physically read, currently committed semantics do
not support the retrieval of log files from the log archive. This affects only databases that you configure
to use infinite logging.
• The following scans do not use currently committed semantics:
– Catalog table scans. Currently committed semantics to apply only to read-only scans that do not
involve internal scans that are used to evaluate or enforce constraints. Currently Committed does not
apply to internal scans on catalog tables, but may be applied to external scans on catalog tables if the
registry variable, DB2COMPOPT, is set to LOCKAVOID_EXT_CATSCANS.
– Scans that are used to enforce referential integrity constraints
– Scans that reference LONG VARCHAR or LONG VARGRAPHIC columns
– Range-clustered table (RCT) scans
– Scans that use spatial or extended indexes
Monitoring
Currently committed row data retrievals can be monitored on a per-table basis through the db2pd -
tcbstats option. See CCLogReads, CCRemoteReqs, CCLockWaits, and CCRemRetryLckWs values in ../../
com.ibm.db2.luw.admin.cmd.doc/doc/r0011729.dita.
Examples
Example 1:
Consider the following scenario, in which deadlocks are avoided by using currently committed semantics.
In this scenario, two applications update two separate tables, as shown in step 1, but do not yet commit.
Each application then attempts to use a read-only cursor to read from the table that the other application
updated, as shown in step 2. These applications are running under the CS isolation level.
Without currently committed semantics, these applications running under the cursor stability isolation
level might create a deadlock, causing one of the applications to fail. This happens when each application
must read data that is being updated by the other application.
Under currently committed semantics, if one of the applications that is running a query in step 2 requires
the data that is being updated by the other application, the first application does not wait for the lock to
be released. As a result, a deadlock is impossible. The first application locates and uses the previously
committed version of the data instead.
Example 2:
Consider the following scenario, in a Db2®pureScale® environment, in which an application avoids a lock
wait condition. Application-A on member 1 has updated data on table T1 but not yet committed its
Without currently committed semantics, application-B would wait until application-A committed its
update and released the row lock, before reading the data. Under currently committed semantics,
application-B will use the previously committed version of the data instead.
Example
The following example provides a comparison between the default locking behavior and the evaluate
uncommitted behavior. The table is the ORG table from the SAMPLE database.
The following transactions occur under the default cursor stability (CS) isolation level.
The uncommitted UPDATE statement in Session 1 holds an exclusive lock on the first row in the table,
preventing the query in Session 2 from returning a result set, even though the row being updated in
Session 1 does not currently satisfy the query in Session 2. The CS isolation level specifies that any row
that is accessed by a query must be locked while the cursor is positioned on that row. Session 2 cannot
obtain a lock on the first row until Session 1 releases its lock.
Waiting for a lock in Session 2 can be avoided by using the evaluate uncommitted feature, which first
evaluates the predicate and then locks the row. As such, the query in Session 2 would not attempt to lock
the first row in the table, thereby increasing application concurrency. Note that this also means that
predicate evaluation in Session 2 would occur with respect to the uncommitted value of deptnumb=5 in
Session 1. The query in Session 2 would omit the first row in its result set, despite the fact that a rollback
of the update in Session 1 would satisfy the query in Session 2.
If the order of operations were reversed, concurrency could still be improved with the evaluate
uncommitted feature. Under default locking behavior, Session 2 would first acquire a row lock prohibiting
the searched UPDATE in Session 1 from executing, even though the Session 1 UPDATE statement would
not change the row that is locked by the Session 2 query. If the searched UPDATE in Session 1 first
attempted to examine rows and then locked them only if they qualified, the Session 1 query would be
non-blocking.
Restrictions
• The DB2_EVALUNCOMMITTED registry variable must be enabled.
• The isolation level must be CS or RS.
• Row-level locking is in effect.
• SARGable evaluation predicates exist.
• Evaluate uncommitted is not applicable to scans on the system catalog tables.
• For multidimensional clustering (MDC) or insert time clustering (ITC) tables, block-level locking can be
deferred for an index scan; however, block-level locking cannot be deferred for table scans.
• Lock deferral will not occur on a table that is executing an inplace table reorganization.
• For Iscan-Fetch plans, row-level locking is not deferred to the data access; rather, the row is locked
during index access before moving to the row in the table.
• Deleted rows are unconditionally skipped during table scans, but deleted index keys are skipped only if
the DB2_SKIPDELETED registry variable is enabled.
An example of a join with expression that would not benefit from a hash join, but would use a nested loop
join instead:
XPRESSN(C) = 'constant'
INTEGER(TRANS_DATE)/100 = 200802
C = INVERSEXPRESSN('constant')
TRANS_DATE BETWEEN 20080201 AND 20080229
Applying expressions over columns prevents the use of index start and stop keys, leads to inaccurate
selectivity estimates, and requires extra processing at query execution time.
These expressions also prevent query rewrite optimizations such as recognizing when columns are
equivalent, replacing columns with constants, and recognizing when at most one row will be returned.
Further optimizations are possible after it can be proven that at most one row will be returned, so the lost
optimization opportunities are further compounded. Consider the following query:
If there is a unique index defined on CUST_ID, the rewritten version of the query enables the query
optimizer to recognize that at most one row will be returned. This avoids introducing an unnecessary
SORT operation. It also enables the CUST_ID and CUST_CODE columns to be replaced by 1234 and '56',
avoiding copying values from the data or index pages. Finally, it enables the predicate on CUST_ID to be
applied as an index start or stop key.
It might not always be apparent when an expression is present in a predicate. This can often occur with
queries that reference views when the view columns are defined by expressions. For example, consider
the following view definition and query:
The query optimizer merges the query with the view definition, resulting in the following query:
Support for case-insensitive search, which was introduced in Db2 Database for Linux, UNIX, and Windows
Version 9.5 Fix Pack 1, is designed to resolve the situation in this particular example. You can use _Sx
attribute on a locale-sensitive UCA-based collation to control the strength of the collations. For example,
a locale-sensitive UCA-based collation with the attributes _LFR_S1 is a French collation that ignores case
and accent.
SELECT…
FROM PRODUCT P, SALES F
WHERE
P.PROD_KEY = F.PROD_KEY AND
F.SALE_DATE BETWEEN P.START_DATE AND
P.END_DATE
Specialized star schema joins, such as star join with index ANDing and hub joins, are not considered if
there are any non-equality join predicates in the query block. (See "Ensuring that queries fit the required
criteria for the star schema join".)
SELECT...
FROM DAILY_SALES F
LEFT OUTER JOIN CUSTOMER C ON F.CUST_KEY = C.CUST_KEY
LEFT OUTER JOIN STORE S ON F.STORE_KEY = S.STORE_KEY
WHERE
C.CUST_NAME = 'SMITH'
The left outer join can prevent a number of optimizations, including the use of specialized star-schema
join access methods. However, in some cases the left outer join can be automatically rewritten to an inner
join by the query optimizer. In this example, the left outer join between CUSTOMER and DAILY_SALES can
be converted to an inner join because the predicate C.CUST_NAME = 'SMITH' will remove any rows
with null values in this column, making a left outer join semantically unnecessary. So the loss of some
optimizations due to the presence of outer joins might not adversely affect all queries. However, it is
important to be aware of these limitations and to avoid outer joins unless they are absolutely required.
Ensuring that queries fit the required criteria for the star schema join
The optimizer considers three specialized join methods for queries based on star schema: star join,
Cartesian hub join, and zigzag join. These join methods can help to significantly improve performance for
such queries.
A query must meet the following criteria to be recognized as a star schema for the purposes of a zigzag
join, star join, or Cartesian hub join plan.
• It must be a star-shaped query with one fact table and at least two dimension tables. If the query
includes more than one fact table with common associated dimension tables (a multiple fact table
query), the query optimizer will split the query into a query with multiple stars in it. The common
dimension tables that join with more than one fact table are then used multiple times in the query. The
explain output will show multiple zigzag join operators for these multiple fact table queries.
• The dimension tables must have a primary key, a unique constraint, or a unique index defined on them;
the primary key can be a composite key. If the dimension tables do not have a primary key, a unique
constraint, or a unique index, then an older star detection method is used to detect a star for Cartesian
hub and star join methods. In that case, the Cartesian hub and star join must meet the criteria described
in “Alternative Cartesian hub join and star join criteria” on page 162.
• The dimension tables and the fact table must be joined using equijoin predicates on all columns that are
part of the primary keys for the dimension tables.
• For Cartesian hub joins and zigzag joins, there must be a multicolumn index on the fact table; columns
that participate in the join are part of that index, which must have enough join columns from the fact
table that at least two dimension tables are covered.
• For Cartesian hub joins and star index ANDing joins, a dimension table or a snowflake must filter the fact
table. (Filtering is based on the optimizer's estimates.) There are also cases where a star join will still
occur if the dimension table is joined with the fact table not as a filter, but as a simple look-up type of
join.
For example, suppose that there are three dimension tables D1, D2, and D3. Dimension table D1 has
primary key A, and it joins with the fact table on column A; dimension table D2 has primary key (B,C), and
it joins with the fact table on columns B and C; finally, dimension table D3 has primary key D, and it joins
with the fact table on column D. Supported index usage is as follows:
• Any one of the following indexes would suffice, because each of these indexes covers at least two
dimension tables: (A,D), (A,B,C), or (C,B,D).
• Index (A,B,C,D) is also suitable, because it covers three dimension tables.
• Index (A,B) cannot be used, because it does not completely cover dimension table D2.
• Index (B,A,C) cannot be used, because columns B and C, which join with the primary key of D2, do not
appear in contiguous positions in the index.
A dimension table cannot participate in any of these joins methods (zigzag join, star join, or Cartesian hub
join) if any of the following occurs:
Procedure
1. Ensure that the tables included in the zigzag join fit the required criteria.
Each dimension tables must have the following properties: a primary key, a unique constraint, or a
unique index that does not have a random ordering of index keys that are defined on it. To define
primary keys, unique constraints, or unique indexes, use commands such as the following example:
select count(*)
from dim1,dim2,dim3,fact
where dim1.d0 = fact.f0
and dim1.d1 = fact.f1
and dim2.d2 = fact.f2
and dim3.d3 = fact.f3
and dim1.c1 = 10
and dim2.c2 < 20;
If no suitable multicolumn index exists, an informational diagnostic message is displayed in the output
of the db2exfmt command.
4. Run the query in EXPLAIN mode and then issue the db2exfmt command to format the EXPLAIN
output.
Examine the output to determine whether the zigzag join was used and whether the wanted
performance was achieved.
5. Optional: If the zigzag join method was not used or if the wanted performance was not achieved, you
might want to create another multicolumn index.
Review the “Extended diagnostic information” section of db2exfmt command output. If an error
message is listed in the output, follow the suggestions (to generate a new index, for instance).
6. Optional: If the wanted performance was not achieved, determine whether there was a gap in the
index.
Review the gap information (Gap Info) section in the db2exfmt output.
If the section indicates that the query contains predicates that are inconsistent with a composite
index, consider a new index or modifying an existing index to avoid the index gap.
Three types of fact table access plans are possible with a zigzag join.
• An index scan-fetch plan: In this plan, the index scan accesses the index over the fact table to retrieve
RIDs from the fact table matching the input probe values. These fact table RIDs are then used to fetch
the necessary fact table data from the fact table. Any dimension table payload columns are then
retrieved from the dimension table and the result row is output by the zigzag join operator.
• A single probe list-prefetch plan: In this plan, a list prefetch plan is executed for every probe row from
the combination of dimension tables and snowflakes. The index scan over the fact table finds fact table
RIDs matching the input probe values. The SORT, RIDSCAN, and FETCH operators sort RIDs according
to data page identifiers and list prefetchers start to get the fact table data. Any dimension table payload
columns are then retrieved from the dimension tables and the result row is output by the zigzag join
operator.
• An all-probes list-prefetch plan: In this plan, the index scan accesses the fact table index for all the
probes from the combination of dimension tables and snowflakes. All such matching RIDs are sorted
2.6623e+06
ZZJOIN
( 5)
7620.42
5.37556
+------------------+------------------+
292.2 40000 0.227781
TBSCAN TBSCAN FETCH
( 6) ( 9) ( 13)
56.2251 7596.78 11.8222
1 2.92 1.22778
| | /---+----\
292.2 40000 0.227781 6.65576e+08
TEMP TEMP IXSCAN TABLE: POPS
( 7) ( 10) ( 14) DAILY_SALES
30.4233 4235.52 9.93701 Q3
1 2.92 1
| | |
292.2 40000 6.65576e+08
IXSCAN FETCH INDEX: POPS
( 8) ( 11) PER_CUST_ST_PROMO
29.9655 4235.07 Q3
1 2.92
| /---+----\
2922 40000 1e+06
INDEX: POPS IXSCAN TABLE: POPS
PERX1 ( 12) CUSTOMER
Q1 2763.52 Q2
1
|
1e+06
INDEX: POPS
CUSTX1
Q2
IS_TEMP_INDEX : True/False
The scan builds an index over the temp for random access of the temp.
(If the flag is 'true')
The scan builds a fast integer sort structure for random access of the temp.
(If the flag is 'false')
The TBSCAN(6) and TBSCAN(9) operators show the information regarding the feedback predicates
applied to the operators, in the form of start-stop key conditions.
Predicates:
----------
5) Start Key Predicate,
Comparison Operator: Equal (=)
Subquery Input Required: No
Filter Factor: 0.000342231
Predicate Text:
--------------
(Q1.PERKEY = Q3.PERKEY)
Predicate Text:
The ZZJOIN(5) operator shows the collection of all the feedback predicates used in the processing of
zigzag join.
Predicates:
----------
4) Feedback Predicate used in Join,
Comparison Operator: Equal (=)
Subquery Input Required: No
Filter Factor: 1e-06
Predicate Text:
--------------
(Q3.CUSTKEY = Q2.CUSTKEY)
Predicate Text:
--------------
(Q1.PERKEY = Q3.PERKEY)
2.6623e+06
ZZJOIN
( 5)
1.10517e+06
5.37556
+------------------+---+-----------------------+
292.2 40000 0.227781
TBSCAN TBSCAN FETCH
( 6) ( 9) ( 13)
56.2251 7596.78 548787
1 2.92 1.22778
| | /----+----\
292.2 40000 0.227781 6.65576e+08
TEMP TEMP RIDSCN TABLE: POPS
( 7) ( 10) ( 14) DAILY_SALES
30.4233 4235.52 319827 Q3
1 2.92 1
| | |
292.2 40000 0.227781
IXSCAN FETCH SORT
( 8) ( 11) ( 15)
29.9655 4235.07 319827
1 2.92 1
| /---+----\ |
2922 40000 1e+06 0.227781
INDEX: POPS IXSCAN TABLE: POPS IXSCAN
PERX1 ( 12) CUSTOMER ( 16)
Q1 2763.52 Q2 10.0149
1 1
| |
1e+06 6.65576e+08
INDEX: POPS INDEX: POPS
CUSTX1 PER_CUST_ST_PROMO
Q2 Q3
This shows that the difference between the index-scan plan and the single-probe plan is the way in which
the fact table is accessed.
All other operators show the same information as the operators in the previous example.
2.6623e+06
ZZJOIN
( 2)
78132.52
27.81
|
2.6623e+06
FETCH
( 3)
65524.23
27.81
|
2.6623e+06
RIDSCN
( 4)
56514.23
4.92
|
2.6623e+06
SORT
( 5)
56514.23
4.92
|
2.6623e+06
ZZJOIN
( 6)
7616.65
4.92
+---------------+--+------------+
292.2 40000 0.227781
TBSCAN TBSCAN IXSCAN
( 7) ( 10) ( 14)
56.2251 7596.78 9.93701
1 2.92 1
| | |
292.2 40000 6.65576e+08
TEMP TEMP INDEX: POPS
( 8) ( 11) PER_CUST_ST_PROMO
30.4233 4235.52 Q3
1 2.92
| |
292.2 40000
IXSCAN FETCH
( 9) ( 12)
29.9655 4235.07
1 2.92
| /---+----\
2922 40000 1e+06
INDEX: POPS IXSCAN TABLE: POPS
PERX1 ( 13) CUSTOMER
Q1 2763.52 Q2
1
|
1e+06
INDEX: POPS
CUSTX1
Q2
Compared to the other access plans, the all probes list-prefetch plan shows an additional operator,
ZZJOIN (2). This operator is being used to perform back-joins of the fact table with the dimension tables.
It shows the following information:
Backjoin = True
select count(*)
from d1, d3, d4, d5, f1
where d1.pk = f1.fk1 and d3.pk = f1.fk3 and d4.pk = f1.fk4 and d5.pk = f1.fk5
The query joins dimensions d1, d3, d4, d5 with fact table f1. Because the dimension d2 is not included in
the query, there is no join predicate with the dimension d2 on the column fk2. The query optimizer
recognizes fact column fk2 as a gap in the index and is able to use the index for a zigzag join.
The db2exfmt command output shows that the index scan is a jump scan, by indicating the
JUMPSCAN=TRUE option. The output also shows the index gap information, specifically that the second
index column has a positioning gap and the other columns do not.
Rows
RETURN
( 1)
Cost
I/O
|
1
GRPBY
( 2)
1539.45
33
|
1000
ZZJOIN
( 3)
1529.44
33
+----------------+----------------++---------------+------------------+
1000 1000 1000 1000 1000
TBSCAN TBSCAN TBSCAN TBSCAN FETCH
( 4) ( 9) ( 14) ( 19) ( 24)
184.085 184.085 184.085 184.085 205.222
8 8 8 8 1
| | | | /---+----\
1000 1000 1000 1000 1000 1000
TEMP TEMP TEMP TEMP RIDSCN TABLE: STAR
( 5) ( 10) ( 15) ( 20) ( 25) F1
184.003 184.003 184.003 184.003 55.5857 Q1
8 8 8 8 1
| | | | |
1000 1000 1000 1000 1000
TBSCAN TBSCAN TBSCAN TBSCAN SORT
( 6) ( 11) ( 16) ( 21) ( 26)
178.62 178.62 178.62 178.62 55.5342
8 8 8 8 1
| | | | |
1000 1000 1000 1000 1e-09
SORT SORT SORT SORT IXSCAN
( 7) ( 12) ( 17) ( 22) ( 27)
178.569 178.569 178.569 178.569 12.0497
8 8 8 8 1
| | | | |
1000 1000 1000 1000 1000
TBSCAN TBSCAN TBSCAN TBSCAN INDEX: STAR
( 8) ( 13) ( 18) ( 23) I11
135.093 135.093 135.093 135.093 Q1
select count(*)
from d2, d3, d4, f1
where d2.pk = f1.fk2 and d3.pk = f1.fk3 and d4.pk = f1.fk4 and fk1=10
In this query, dimensions d2, d3 and d4 join with the fact table f1. There is no join predicate on fact
column fk1, there is only a local predicate fk1=10.
The query optimizer recognizes the fact column fk1 as a gap, because there is no join predicate on it. The
query optimizer is still able to use the index for zigzag join.
The db2exfmt command output shows that the index scan is a jump scan, by indicating the
JUMPSCAN=TRUE option. The output also shows the index gap information, specifically that the first
index column has a positioning gap and the other columns do not.
Rows
RETURN
( 1)
Cost
I/O
|
1
GRPBY
( 2)
893.899
25.12
|
40
HSJOIN
( 3)
893.489
25.12
/------------+-------------\
1000 40
TBSCAN ZZJOIN
( 4) ( 5)
135.093 750.88
8 17.12
| +----------------++-----------------+
1000 1000 1000 40
TABLE: STAR TBSCAN TBSCAN FETCH
D4 ( 6) ( 11) ( 16)
Q2 184.085 184.085 18.1845
8 8 1.12004
| | /---+----\
1000 1000 40 1000
TEMP TEMP RIDSCN TABLE: STAR
( 7) ( 12) ( 17) F1
184.003 184.003 13.4358 Q1
8 8 1.12
| | |
1000 1000 40
The Db2 optimizer does not recognize the predicates as identical, and treats them as independent. This
leads to underestimation of cardinalities, suboptimal query access plans, and longer query run times.
For that reason, the redundant predicates are removed by the Db2 database platform-specific software
layer.
These predicates are transferred to the following ones and only the predicates on the fact table column
"SID_0CALMONTH" remain:
Apply the instructions in SAP notes 957070 and 1144883 to remove the redundant predicates.
Informational constraints must not be violated, otherwise queries might return incorrect results. In this
example, if any rows in DAILY_SALES do not have a corresponding customer key in the CUSTOMER table,
the query would incorrectly return those rows.
Another type of informational constraint is the NOT ENFORCED NOT TRUSTED constraint. It can be useful
to specify this type of informational constraint if an application cannot verify that the rows of a table will
conform to the constraint. The NOT ENFORCED NOT TRUSTED constraint can be used to improve query
optimization in cases where the Db2 optimizer can use the data to infer statistics from a statistical view.
In these cases the strict matching between the values in the foreign keys and the primary keys are not
needed. If a constraint is NOT TRUSTED and enabled for query optimization, then it will not be used to
perform optimizations that depend on the data conforming completely to the constraint, such as join
elimination.
When RI (referential integrity) tables are related by informational constraints, the informational
constraints might be used in the incremental maintenance of dependant MQT data, staging tables, and
query optimization. Violating an informational constraint might result in inaccurate MQT data and query
results.
For example, parent and child tables are related by informational constraints, so the order in which they
are maintained affects query results and MQT integrity. If there is data in the child table that cannot be
related to a row in the parent table, an orphan row has been created. Orphan rows are a violation of the
informational constraint relating that parent and child table. The dependent MQT data and staging tables
associated with the parent-child tables might be updated with incorrect data, resulting in unpredictable
optimization behavior.
If you have an ENFORCED informational constraint, Db2 will force you to maintain RI tables in the correct
order. For example, if you deleted a row in a parent table that would result in an orphan row, Db2 returns
an SQL error and rolls back the change.
If you have a NOT ENFORCED informational constraint, you must maintain the integrity of the RI tables by
updating tables in the correct order. The order in which parent-child tables are maintained is important to
ensure MQT data integrity.
For example, you have set up the following parent and child table with a corresponding MQT:
alter table child add constraint fk1 foreign key (i2) references parent (i1) not enforced;
enable query optimization;
create table mqt1 as (select p.i1 as c1, p.i2 as c2, c.i1 as c3, count (*) as cnt from parent p, child c
where p.i1 = c.i2 group by p.i1, p.i2, c.i1) data
initially deferred refresh immediate;
commit;
To insert rows into parent-child tables, you must insert rows into the parent table first.
If rows are inserted into the child table first, orphan rows exist while there is no row in the parent table
that matches the child row's foreign key. This violation, although temporary, might produce unpredictable
behavior during query optimization and MQT processing.
To remove rows from the parent table, you must remove the related rows from the child table first.
If rows are removed from the parent table first, orphan rows are created when a child row's foreign key no
longer matches a row key in the parent table. This results in a violation of the informational constraint
between the parent and child tables. This violation, although temporary, might produce unpredictable
behavior during query optimization and MQT processing.
Using the REOPT bind option with input variables in complex queries
Input variables are essential for good statement preparation times in an online transaction processing
(OLTP) environment, where statements tend to be simpler and query access plan selection is more
straightforward.
Multiple executions of the same query with different input variable values can reuse the compiled access
section in the dynamic statement cache, avoiding expensive SQL statement compilations whenever the
input values change.
However, input variables can cause problems for complex query workloads, where query access plan
selection is more complex and the optimizer needs more information to make good decisions. Moreover,
statement compilation time is usually a small component of total execution time, and business
intelligence (BI) queries, which do not tend to be repeated, do not benefit from the dynamic statement
cache.
If input variables need to be used in a complex query workload, consider using the REOPT(ALWAYS) bind
option. The REOPT bind option defers statement compilation from PREPARE to OPEN or EXECUTE time,
when the input variable values are known. The values are passed to the SQL compiler so that the
optimizer can use the values to compute a more accurate selectivity estimate. REOPT(ALWAYS) specifies
that the statement should be recompiled for every execution. REOPT(ALWAYS) can also be used for
complex queries that reference special registers, such as WHERE TRANS_DATE = CURRENT DATE - 30
DAYS, for example. If input variables lead to poor access plan selection for OLTP workloads, and
REOPT(ALWAYS) results in excessive overhead due to statement compilation, consider using
REOPT(ONCE) for selected queries. REOPT(ONCE) defers statement compilation until the first input
variable value is bound. The SQL statement is compiled and optimized using this first input variable value.
Subsequent executions of the statement with different values reuse the access section that was compiled
on the basis of the first input value. This can be a good approach if the first input variable value is
representative of subsequent values, and it provides a better query access plan than one that is based on
default values when the input variable values are unknown.
There a number of ways that REOPT can be specified:
• For embedded SQL in C/C++ applications, use the REOPT bind option. This bind option affects re-
optimization behavior for both static and dynamic SQL.
• For CLI applications, set the REOPT value in one of the following ways:
– Use the REOPT keyword setting in the db2cli.ini configuration file. The values and corresponding
options are:
- 2 = SQL_REOPT_NONE
- 3 = SQL_REOPT_ONCE
- 4 = SQL_REOPT_ALWAYS
– Use the SQL_ATTR_REOPT connection or statement attribute.
– Use the SQL_ATTR_CURRENT_PACKAGE_SET connection or statement attribute to specify either the
NULLID, NULLIDR1, or NULLIDRA package sets. NULLIDR1 and NULLIDRA are reserved package set
names. When used, REOPT ONCE or REOPT ALWAYS are implied, respectively. These package sets
have to be explicitly created with the following commands:
• For JDBC applications that use the IBM Data Server Driver for JDBC and SQLJ, specify the -reopt value
when you run the DB2Binder utility.
• For SQL PL procedures, use one of the following approaches:
– Use the SET_ROUTINE_OPTS stored procedure to set the bind options that are to be used for the
creation of SQL PL procedures within the current session. For example, call:
sysproc.set_routine_opts('reopt always')
– Use the DB2_SQLROUTINE_PREPOPTS registry variable to set the SQL PL procedure options at the
instance level. Values set using the SET_ROUTINE_OPTS stored procedure will override those
specified with DB2_SQLROUTINE_PREPOPTS.
You can also use optimization profiles to set REOPT for static and dynamic statements, as shown in the
following example:
Even relatively simple SQL statements can result in excessive system CPU usage due to statement
compilation, if they are run very frequently. If your system experiences this type of performance problem,
consider changing the application to use parameter markers to pass predicate values to the Db2
compiler, rather than explicitly including them in the SQL statement. However, the access plan might not
Although you have previously defined a descending index on the SALARY column, this index is likely to be
poorly clustered, because employees are ordered by employee number. To avoid many random
synchronous I/Os, the optimizer would probably choose the list prefetch access method, which requires
sorting the row identifiers of all rows that qualify. This sort causes a delay before the first qualifying rows
can be returned to the application. To prevent this delay, add the OPTIMIZE FOR clause to the statement
as follows:
In this case, the optimizer will likely choose to use the SALARY index directly, because only the 20
employees with the highest salaries are retrieved. Regardless of how many rows might be blocked, a
block of rows is returned to the client every twenty rows.
With the OPTIMIZE FOR clause, the optimizer favors access plans that avoid bulk operations or flow
interruptions, such as those that are caused by sort operations. You are most likely to influence an access
path by using the OPTIMIZE FOR 1 ROW clause. Using this clause might have the following effects:
• Join sequences with composite inner tables are less likely, because they require a temporary table.
• The join method might change. A nested loop join is the most likely choice, because it has low overhead
cost and is usually more efficient when retrieving a few rows.
• An index that matches the ORDER BY clause is more likely, because no sort is required for the ORDER
BY.
• List prefetching is less likely, because this access method requires a sort.
• Sequential prefetching is less likely, because only a small number of rows is required.
• In a join query, the table with columns in the ORDER BY clause is likely to be chosen as the outer table if
an index on the outer table provides the ordering that is needed for the ORDER BY clause.
Although the OPTIMIZE FOR clause applies to all optimization levels, it works best for optimization class
3 and higher, because classes lower than 3 use the greedy join enumeration search strategy. This method
sometimes results in access plans for multi-table joins that do not lend themselves to quick retrieval of
the first few rows.
If a packaged application uses the call-level interface (CLI or ODBC), you can use the
OPTIMIZEFORNROWS keyword in the db2cli.ini configuration file to have CLI automatically append an
OPTIMIZE FOR clause to the end of each query statement.
When data is selected from nicknames, results can vary depending on data source support. If the data
source that is referenced by a nickname supports the OPTIMIZE FOR clause, and the Db2 optimizer
pushes the entire query down to the data source, then the clause is generated in the remote SQL that is
sent to the data source. If the data source does not support this clause, or if the optimizer decides that
the least costly plan is local execution, the OPTIMIZE FOR clause is applied locally. In this case, the Db2
optimizer prefers access plans that minimize the response time for retrieving the first few rows of a query,
but the options that are available to the optimizer for generating plans are slightly limited, and
performance gains from the OPTIMIZE FOR clause might be negligible.
If the OPTIMIZE FOR clause and the FETCH FIRST clause are both specified, the lower of the two n values
affects the communications buffer size. The two values are considered independent of each other for
optimization purposes.
Procedure
To specify row blocking:
1. Use the values of the aslheapsz and rqrioblk configuration parameters to estimate how many
rows are returned for each block. In both formulas, orl is the output row length, in bytes.
• Use the following formula for local applications:
2. To enable row blocking, specify an appropriate value for the BLOCKING option on the BIND or PREP
command.
If you do not specify the BLOCKING option, the default row blocking type is UNAMBIG. For the
command line processor (CLP) and the call-level interface (CLI), the default row blocking type is ALL.
Scenario 1: SKIP LOCKED DATA does not skip rows due to currently committed semantics
Create a table TI as:
CREATE TABLE T1
(C1 INTEGER,
C2 VARCHAR(30))
Table 14.
C1 C2
1 AAAAAAAA
2 BBBBBBBB
3 CCCCCCCC
4 DDDDDDDD
5 EEEEEEEE
Note: CUR_COMMIT does not apply to SELECT statements when FOR UPDATE or RS isolation are used. If
SKIP LOCKED is used, such queries, as well as searched updates and deletes, will skip rows rather than
waiting for a lock when a lock conflict is encountered.
Scenario 2: SKIP LOCKED DATA does not skip rows due to lock avoidance
Create a table TI as:
CREATE TABLE T1
(C1 INT,
C2 CHAR );
Table 16.
Session 1 Session 2
db2 +c "UPDATE T1 SET C1 = 99 WHERE C1
< 3"
• Session 1 acquires X lock on Rows 1 and 2
Lock management
Lock management is one of the factors that affect application performance. Review this section for details
about lock management considerations that can help you to maximize the performance of database
applications.
Lock escalation
Lock escalation is the process of converting many fine-grain locks to fewer coarse-grain locks, which
reduces memory overhead at the cost of decreasing concurrency.
It is the act of releasing a large number of fine-grain row, MDC, LOB, or XML locks which are held by an
application process on a single table, to acquire a table lock, or other coarse-grain lock such as block or
LOB locks, of mode S or X instead.
Lock escalation occurs when an application exceeds the MAXLOCKS threshold or the database approaches
the LOCKLIST limit. The database manager writes messages (AM5500W/ADM5501I) to the
administration notification log which identifies the table for which lock escalation occurred, and some
information to help you identify what plan or package was running when the escalation occurred.
The benefit of lock escalation is that operations that would otherwise fail with an SQL0912N error, can
instead become successful due to lock escalation. However, the operation may still fail due to lock
timeout or deadlock. As a drawback, lock escalation may negatively affect concurrency with other
applications which may need to access the table.
Lock granularity
If one application holds a lock on a database object, another application might not be able to access that
object. For this reason, row-level locks, which minimize the amount of data that is locked and therefore
Lock attributes
Database manager locks have several basic attributes.
These attributes include the following:
Mode
The type of access allowed for the lock owner, as well as the type of access allowed for concurrent
users of the locked object. It is sometimes referred to as the state of the lock.
Object
The resource being locked. The only type of object that you can lock explicitly is a table. The database
manager also sets locks on other types of resources, such as rows and table spaces. Block locks can
also be set for multidimensional clustering (MDC) or insert time clustering (ITC) tables, and data
partition locks can be set for partitioned tables. The object being locked determines the granularity of
the lock.
Lock count
The length of time during which a lock is held. The isolation level under which a query runs affects the
lock count.
Table 17 on page 186 lists the lock modes and describes their effects, in order of increasing control over
resources.
If an index is not used, the entire table must be scanned in sequence to find the required rows, and the
optimizer will likely choose a single table-level lock (S). For example, if there is no index on the column
SEX, a table scan might be used to select all male employees, as follows:
Note: Cursor-controlled processing uses the lock mode of the underlying cursor until the application finds
a row to update or delete. For this type of processing, no matter what the lock mode of the cursor might
be, an exclusive lock is always obtained to perform the update or delete operation.
Locking in range-clustered tables works slightly differently from standard key locking. When accessing a
range of rows in a range-clustered table, all rows in the range are locked, even when some of those rows
are empty. In standard key locking, only rows with existing data are locked.
Deferred access to data pages implies that access to a row occurs in two steps, which results in more
complex locking scenarios. The timing of lock acquisition and the persistence of locks depend on the
isolation level. Because the repeatable read (RR) isolation level retains all locks until the end of a
transaction, the locks acquired in the first step are held, and there is no need to acquire further locks
during the second step. For the read stability (RS) and cursor stability (CS) isolation levels, locks must be
acquired during the second step. To maximize concurrency, locks are not acquired during the first step,
and the reapplication of all predicates ensures that only qualifying rows are returned.
Next-key locking
During insertion of a key into an index, the row that corresponds to the key that will follow the new key in
the index is locked only if that row is currently locked by a repeatable read (RR) index scan. When this
occurs, insertion of the new index key is deferred until the transaction that performed the RR scan
completes.
The lock mode that is used for the next-key lock is NW (next key weak exclusive). This next-key lock is
released before key insertion occurs; that is, before a row is inserted into the table.
Key insertion also occurs when updates to a row result in a change to the value of the index key for that
row, because the original key value is marked deleted and the new key value is inserted into the index.
For updates that affect only the include columns of an index, the key can be updated in place, and no
next-key locking occurs.
During RR scans, the row that corresponds to the key that follows the end of the scan range is locked in S
mode. If no keys follow the end of the scan range, an end-of-table lock is acquired to lock the end of the
index. In the case of partitioned indexes for partitioned tables, locks are acquired to lock the end of each
index partition, instead of just one lock for the end of the index. If the key that follows the end of the scan
range is marked deleted, one of the following actions occurs:
• The scan continues to lock the corresponding rows until it finds a key that is not marked deleted
• The scan locks the corresponding row for that key
Note: Under the UR isolation level, if there are predicates on include columns in the index, the isolation
level is upgraded to CS and the locks are upgraded to an IS table lock or NS row locks.
Table 21. Lock Modes for RID Index Scans with No Predicates
Isolation level Read-only and Cursored operations Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR S/- IX/S IX/X X/- X/-
RS IS/NS IX/U IX/X IX/X IX/X
CS IS/NS IX/U IX/X IX/X IX/X
UR IN/- IX/U IX/X IX/X IX/X
Table 22. Lock Modes for RID Index Scans with a Single Qualifying Row
Isolation level Read-only and Cursored operations Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S IX/U IX/X IX/X IX/X
RS IS/NS IX/U IX/X IX/X IX/X
CS IS/NS IX/U IX/X IX/X IX/X
UR IN/- IX/U IX/X IX/X IX/X
Table 23. Lock Modes for RID Index Scans with Start and Stop Predicates Only
Isolation level Read-only and Cursored operations Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S IX/S IX/X IX/X IX/X
RS IS/NS IX/U IX/X IX/X IX/X
CS IS/NS IX/U IX/X IX/X IX/X
UR IN/- IX/U IX/X IX/X IX/X
Table 25. Lock Modes for Index Scans Used for Deferred Data Page Access: RID Index Scan with No
Predicates
Isolation level Read-only and Cursored operations Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S IX/S X/-
RS IN/- IN/- IN/-
CS IN/- IN/- IN/-
UR IN/- IN/- IN/-
Table 26. Lock Modes for Index Scans Used for Deferred Data Page Access: After a RID Index Scan with
No Predicates
Isolation level Read-only and Cursored operations Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IN/- IX/S IX/X X/- X/-
RS IS/NS IX/U IX/X IX/X IX/X
CS IS/NS IX/U IX/X IX/X IX/X
UR IN/- IX/U IX/X IX/X IX/X
Table 27. Lock Modes for Index Scans Used for Deferred Data Page Access: RID Index Scan with
Predicates (sargs, resids)
Isolation level Read-only and Cursored operations Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S IX/S IX/S
RS IN/- IN/- IN/-
CS IN/- IN/- IN/-
UR IN/- IN/- IN/-
Table 29. Lock Modes for Index Scans Used for Deferred Data Page Access: RID Index Scan with Start and
Stop Predicates Only
Isolation level Read-only and Cursored operations Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S IX/S IX/X
RS IN/- IN/- IN/-
CS IN/- IN/- IN/-
UR IN/- IN/- IN/-
Table 30. Lock Modes for Index Scans Used for Deferred Data Page Access: After a RID Index Scan with
Start and Stop Predicates Only
Isolation level Read-only and Cursored operations Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IN/- IX/S IX/X IX/X IX/X
RS IS/NS IX/U IX/X IX/U IX/X
CS IS/NS IX/U IX/X IX/U IX/X
UR IS/- IX/U IX/X IX/U IX/X
Lock modes for MDC and ITC tables and RID index scans
The type of lock that a multidimensional clustering (MDC) or insert time clustering (ITC) table obtains
during a table or RID index scan depends on the isolation level that is in effect and on the data access
plan that is being used.
The following tables show the types of locks that are obtained for MDC and ITC tables under each
isolation level for different access plans. Each entry has three parts: the table lock, the block lock, and the
row lock. A hyphen indicates that a particular lock granularity is not available.
Tables 9-14 show the types of locks that are obtained for RID index scans when the reading of data pages
is deferred. Under the UR isolation level, if there are predicates on include columns in the index, the
isolation level is upgraded to CS and the locks are upgraded to an IS table lock, an IS block lock, or NS
row locks.
• Table 1. Lock Modes for Table Scans with No Predicates
• Table 2. Lock Modes for Table Scans with Predicates on Dimension Columns Only
• Table 3. Lock Modes for Table Scans with Other Predicates (sargs, resids)
Table 32. Lock Modes for Table Scans with Predicates on Dimension Columns Only
Isolation Read-only and Cursored operation Searched update or delete
level ambiguous scans
Scan Where current of Scan Update or
delete
RR S/-/- U/-/- SIX/IX/X U/-/- SIX/X/-
RS IS/IS/NS IX/IX/U IX/IX/X IX/U/- X/X/-
CS IS/IS/NS IX/IX/U IX/IX/X IX/U/- X/X/-
UR IN/IN/- IX/IX/U IX/IX/X IX/U/- X/X/-
Table 33. Lock Modes for Table Scans with Other Predicates (sargs, resids)
Isolation Read-only and Cursored operation Searched update or delete
level ambiguous scans
Scan Where current of Scan Update or
delete
RR S/-/- U/-/- SIX/IX/X U/-/- SIX/IX/X
RS IS/IS/NS IX/IX/U IX/IX/X IX/IX/U IX/IX/X
Table 34. Lock Modes for RID Index Scans with No Predicates
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR S/-/- IX/IX/S IX/IX/X X/-/- X/-/-
RS IS/IS/NS IX/IX/U IX/IX/X X/X/X X/X/X
CS IS/IS/NS IX/IX/U IX/IX/X X/X/X X/X/X
UR IN/IN/- IX/IX/U IX/IX/X X/X/X X/X/X
Table 35. Lock Modes for RID Index Scans with a Single Qualifying Row
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/IS/S IX/IX/U IX/IX/X X/X/X X/X/X
RS IS/IS/NS IX/IX/U IX/IX/X X/X/X X/X/X
CS IS/IS/NS IX/IX/U IX/IX/X X/X/X X/X/X
UR IN/IN/- IX/IX/U IX/IX/X X/X/X X/X/X
Table 36. Lock Modes for RID Index Scans with Start and Stop Predicates Only
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/IS/S IX/IX/S IX/IX/X IX/IX/X IX/IX/X
RS IS/IS/NS IX/IX/U IX/IX/X IX/IX/X IX/IX/X
CS IS/IS/NS IX/IX/U IX/IX/X IX/IX/X IX/IX/X
UR IN/IN/- IX/IX/U IX/IX/X IX/IX/X IX/IX/X
Table 37. Lock Modes for RID Index Scans with Index Predicates Only
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S/S IX/IX/S IX/IX/X IX/IX/S IX/IX/X
RS IS/IS/NS IX/IX/U IX/IX/X IX/IX/U IX/IX/X
Table 38. Lock Modes for RID Index Scans with Other Predicates (sargs, resids)
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S/S IX/IX/S IX/IX/X IX/IX/S IX/IX/X
RS IS/IS/NS IX/IX/U IX/IX/X IX/IX/U IX/IX/X
CS IS/IS/NS IX/IX/U IX/IX/X IX/IX/U IX/IX/X
UR IN/IN/- IX/IX/U IX/IX/X IX/IX/U IX/IX/X
Table 39. Lock Modes for Index Scans Used for Deferred Data Page Access: RID Index Scan with No
Predicates
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S/S IX/IX/S X/-/-
RS IN/IN/- IN/IN/- IN/IN/-
CS IN/IN/- IN/IN/- IN/IN/-
UR IN/IN/- IN/IN/- IN/IN/-
Table 40. Lock Modes for Index Scans Used for Deferred Data Page Access: After a RID Index Scan with
No Predicates
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IN/IN/- IX/IX/S IX/IX/X X/-/- X/-/-
RS IS/IS/NS IX/IX/U IX/IX/X IX/IX/X IX/IX/X
CS IS/IS/NS IX/IX/U IX/IX/X IX/IX/X IX/IX/X
UR IN/IN/- IX/IX/U IX/IX/X IX/IX/X IX/IX/X
Table 42. Lock Modes for Index Scans Used for Deferred Data Page Access: After a RID Index Scan with
Predicates (sargs, resids)
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IN/IN/- IX/IX/S IX/IX/X IX/IX/S IX/IX/X
RS IS/IS/NS IX/IX/U IX/IX/X IX/IX/U IX/IX/X
CS IS/IS/NS IX/IX/U IX/IX/X IX/IX/U IX/IX/X
UR IN/IN/- IX/IX/U IX/IX/X IX/IX/U IX/IX/X
Table 43. Lock Modes for Index Scans Used for Deferred Data Page Access: RID Index Scan with Start and
Stop Predicates Only
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/IS/S IX/IX/S IX/IX/X
RS IN/IN/- IN/IN/- IN/IN/-
CS IN/IN/- IN/IN/- IN/IN/-
UR IN/IN/- IN/IN/- IN/IN/-
Table 44. Lock Modes for Index Scans Used for Deferred Data Page Access: After a RID Index Scan with
Start and Stop Predicates Only
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IN/IN/- IX/IX/S IX/IX/X IX/IX/X IX/IX/X
RS IS/IS/NS IX/IX/U IX/IX/X IX/IX/U IX/IX/X
CS IS/IS/NS IX/IX/U IX/IX/X IX/IX/U IX/IX/X
UR IS/-/- IX/IX/U IX/IX/X IX/IX/U IX/IX/X
Table 46. Lock Modes for Index Scans with Predicates on Dimension Columns Only
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/-/- IX/IX/S IX/IX/X X/-/- X/-/-
RS IS/IS/NS IX/IX/U IX/IX/X IX/X/- IX/X/-
Table 47. Lock Modes for Index Scans with Start and Stop Predicates Only
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S/- IX/IX/S IX/IX/S IX/IX/S IX/IX/S
RS IX/IX/S IX/IX/U IX/IX/X IX/IX/- IX/IX/-
CS IX/IX/S IX/IX/U IX/IX/X IX/IX/- IX/IX/-
UR IN/IN/- IX/IX/U IX/IX/X IX/IX/- IX/IX/-
Table 49. Lock Modes for Index Scans Used for Deferred Data Page Access: Block Index Scan with No
Predicates
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S/-- IX/IX/S X/--/--
RS IN/IN/-- IN/IN/-- IN/IN/--
CS IN/IN/-- IN/IN/-- IN/IN/--
UR IN/IN/-- IN/IN/-- IN/IN/--
Table 50. Lock Modes for Index Scans Used for Deferred Data Page Access: After a Block Index Scan with
No Predicates
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IN/IN/-- IX/IX/S IX/IX/X X/--/-- X/--/--
Table 51. Lock Modes for Index Scans Used for Deferred Data Page Access: Block Index Scan with
Predicates on Dimension Columns Only
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S/-- IX/IX/-- IX/S/--
RS IS/IS/NS IX/--/-- IX/--/--
CS IS/IS/NS IX/--/-- IX/--/--
UR IN/IN/-- IX/--/-- IX/--/--
Table 52. Lock Modes for Index Scans Used for Deferred Data Page Access: After a Block Index Scan with
Predicates on Dimension Columns Only
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IN/IN/-- IX/IX/S IX/IX/X IX/S/-- IX/X/--
RS IS/IS/NS IX/IX/U IX/IX/X IX/U/-- IX/X/--
CS IS/IS/NS IX/IX/U IX/IX/X IX/U/-- IX/X/--
UR IN/IN/-- IX/IX/U IX/IX/X IX/U/-- IX/X/--
Table 53. Lock Modes for Index Scans Used for Deferred Data Page Access: Block Index Scan with Start
and Stop Predicates Only
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S/-- IX/IX/-- IX/X/--
RS IN/IN/-- IN/IN/-- IN/IN/--
CS IN/IN/-- IN/IN/-- IN/IN/--
UR IN/IN/-- IN/IN/-- IN/IN/--
Table 55. Lock Modes for Index Scans Used for Deferred Data Page Access: Block Index Scan with Other
Predicates (sargs, resids)
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IS/S/-- IX/IX/-- IX/IX/--
RS IN/IN/-- IN/IN/-- IN/IN/--
CS IN/IN/-- IN/IN/-- IN/IN/--
UR IN/IN/-- IN/IN/-- IN/IN/--
Table 56. Lock Modes for Index Scans Used for Deferred Data Page Access: After a Block Index Scan with
Other Predicates (sargs, resids)
Isolation level Read-only and Cursored operation Searched update or delete
ambiguous scans
Scan Where current Scan Update or
of delete
RR IN/IN/-- IX/IX/S IX/IX/X IX/IX/S IX/IX/X
RS IS/IS/NS IX/IX/U IX/IX/X IX/IX/U IX/IX/X
CS IS/IS/NS IX/IX/U IX/IX/X IX/IX/U IX/IX/X
UR IN/IN/-- IX/IX/U IX/IX/X IX/IX/U IX/IX/X
26 record(s) selected.
In this example, a lock object of type TABLE_LOCK and a DATA_PARTITION_ID of -1 are used to control
access to and concurrency on the partitioned table TP1. The lock objects of type TABLE_PART_LOCK are
used to control most access to and concurrency on each data partition.
There are additional lock objects of type TABLE_LOCK captured in this output (TAB_FILE_ID 4 through
16) that do not have a value for DATA_PARTITION_ID. A lock of this type, where an object with a
TAB_FILE_ID and a TBSP_NAME correspond to a data partition or index on the partitioned table, might be
used to control concurrency with the online backup utility.
Lock conversion
Changing the mode of a lock that is already held is called lock conversion.
Lock conversion occurs when a process accesses a data object on which it already holds a lock, and the
access mode requires a more restrictive lock than the one already held. A process can hold only one lock
on a data object at any given time, although it can request a lock on the same data object many times
indirectly through a query.
Some lock modes apply only to tables, others only to rows, blocks, or data partitions. For rows or blocks,
conversion usually occurs if an X lock is needed and an S or U lock is held.
Deadlocks
A deadlock is created when two applications lock data that is needed by the other, resulting in a situation
in which neither application can continue executing.
For example, in Figure 23 on page 204, there are two applications running concurrently: Application A and
Application B. The first transaction for Application A is to update the first row in Table 1, and the second
transaction is to update the second row in Table 2. Application B updates the second row in Table 2 first,
and then the first row in Table 1. At time T1, Application A locks the first row in Table 1. At the same time,
Application B locks the second row in Table 2. At time T2, Application A requests a lock on the second
row in Table 2. However, at the same time, Application B tries to lock the first row in Table 1. Because
Application A will not release its lock on the first row in Table 1 until it can complete an update to the
second row in Table 2, and Application B will not release its lock on the second row in Table 2 until it can
complete an update to the first row in Table 1, a deadlock occurs. The applications wait until one of them
releases its lock on the data.
Because applications do not voluntarily release locks on data that they need, a deadlock detector process
is required to break deadlocks. The deadlock detector monitors information about agents that are waiting
on locks, and awakens at intervals that are specified by the dlchktime database configuration
parameter.
If it finds a deadlock, the deadlock detector arbitrarily selects one deadlocked process as the victim
process to roll back. The victim process is awakened, and returns SQLCODE -911 (SQLSTATE 40001),
with reason code 2, to the calling application. The database manager rolls back uncommitted
transactions from the selected process automatically. When the rollback operation is complete, locks that
Member 1:
db2 -v "select * from tab1"
db2 -v "delete from tab3 where col1 > 100"
Member 2:
db2 -v "insert into tab2 values (20,20)"
db2 -v "select * from tab4"
To ensure that tables remain in the NOT_SHARED state, tune your applications or use EHL for workloads
where only a single member accesses a data table, partitioned table, or partitioned index.
Example 2: In the following example, the Db2 database server detects that the EHL optimization does not
apply. Multiple members are all attempting to access the same table tab1. The table does not transition
to the NOT_SHARED state.
Member 1:
db2 -v "select * from tab1"
Member 2:
db2 -v "insert into tab1 values (20,20)"
Member 1:
db2 -v "delete from tab1 where col1 > 100"
Use MON_GET_TABLE to monitor whether or not tables transition the NOT_SHARED state.
Example 3: Running extent reclaim on a remote member. Assume there is a table named T5_01A, which
is currently in the NOT_SHARED state on Member 1:
• M1: db2 "connect to testdb"
• M1: db2 "select count(*) from T5_01A"
• M1: db2 "select data_sharing_state from table(mon_get_table(null, null, -1)) where
tabname='T5_01A'"
DATA_SHARING_STATE
-------------------
NOT_SHARED
Extent reclaim is started at Member 0. The operation will need to move an extent which belongs to the
T5_01A table.
• M0: db2 "alter tablespace TS_01A reduce max"
As a result, the table will become shared.
• M1: db2 "select data_sharing_state from table(mon_get_table(null, null, -1)) where
tabname='T5_01A'"
DATA_SHARING_STATE
-------------------
SHARED
Example 4: Running extent reclaim locally. Assume there is a table named T5_01A, which is currently in
the NOT_SHARED state on Member 1:
• M1: db2 "connect to testdb"
• M1: db2 "select count(*) from T5_01A"
• M1: db2 "select data_sharing_state from table(mon_get_table(null, null, -1)) where
tabname='T5_01A'"
DATA_SHARING_STATE
-------------------
NOT_SHARED
Extent reclaim is started at Member 1. The operation will need to move an extent which belongs to the
T5_01A table.
• M1: db2 "alter tablespace TS_01A reduce max"
The table state will remain unchanged.
• M1: db2 "select data_sharing_state from table(mon_get_table(null, null, -1)) where
tabname='T5_01A'"
DATA_SHARING_STATE
-------------------
NOT_SHARED
Monitoring EHL
EHL can be monitored using the following monitors and APIs:
• The LOCK WAIT event monitor
• MON_GET_DATABASE() administrative API
• MON_GET_TABLE() administrative API
• MON_GET_LOCKS() administrative API
• MON_GET_APPL_LOCKWAIT()administrative API
Query optimization
Query optimization is one of the factors that affect application performance. Review this section for
details about query optimization considerations that can help you to maximize the performance of
database applications.
1. Parse query
The SQL and XQuery compiler analyzes the query to validate the syntax. If any syntax errors are
detected, the query compiler stops processing and returns an appropriate error to the application that
submitted the query. When parsing is complete, an internal representation of the query is created and
stored in the query graph model.
2. Check semantics
The compiler ensures that there are no inconsistencies among parts of the statement. For example,
the compiler verifies that a column specified for the YEAR scalar function has been defined with a
datetime data type.
The following user-written query lists those employees who have a high level of education and who earn
more than $35,000 per year:
During query rewrite, these two views could be merged to create the following query:
By merging the SELECT statements from the two views with the user-written SELECT statement, the
optimizer can consider more choices when selecting an access plan. In addition, if the two views that
have been merged use the same base table, additional rewriting might be performed.
The SQL and XQuery compiler can eliminate the join and simplify the query to:
The following example assumes that a referential constraint exists between the EMPLOYEE and
DEPARTMENT tables on the department number. First, a view is created.
becomes:
Note that in this situation, even if you know that the query can be rewritten, you might not be able to do
so because you do not have access to the underlying tables. You might only have access to the view
(shown previously). Therefore, this type of optimization has to be performed by the database manager.
Redundancy in referential integrity joins likely occurs when:
• Views are defined with joins
• Queries are automatically generated
In this example, because the primary key is being selected, the compiler knows that each returned row is
unique. In this case, the DISTINCT keyword is redundant. If the query is not rewritten, the optimizer must
build a plan with necessary processing (such as a sort) to ensure that the column values are unique.
The following query against this view is not as efficient as it could be:
During query rewrite, the compiler pushes the lastname = 'BROWN' predicate down into the
D11_EMPLOYEE view. This allows the predicate to be applied sooner and potentially more efficiently. The
actual query that could be executed in this example is as follows:
Predicate pushdown is not limited to views. Other situations in which predicates can be pushed down
include UNION, GROUP BY, and derived tables (nested table expressions or common table expressions).
Because this query is correlated, and because both PROJECT and EMPLOYEE are unlikely to be
partitioned on PROJNO, the broadcasting of each project to each database partition is possible. In
addition, the subquery would have to be evaluated many times.
The compiler can rewrite the query as follows:
• Determine the distinct list of employees working on programming projects and call it DIST_PROJS. It
must be distinct to ensure that aggregation is done only once for each project:
• Join DIST_PROJS with the EMPLOYEE table to get the average compensation per project,
AVG_PER_PROJ:
avg_per_proj(projno, avg_comp) as
(select p2.projno, avg(e1.salary+e1.bonus+e1.comm)
from employee e1, dist_projs p2
where e1.empno = p2.empno
group by p2.projno)
This query computes the avg_comp per project (avg_per_proj). The result can then be broadcast to all
database partitions that contain the EMPLOYEE table.
Compiler rewrite example: Operation movement - Predicate pushdown for combined SQL/XQuery
statements
One fundamental technique for the optimization of relational SQL queries is to move predicates in the
WHERE clause of an enclosing query block into an enclosed lower query block (for example, a view),
thereby enabling early data filtering and potentially better index usage.
This is even more important in partitioned database environments, because early filtering potentially
reduces the amount of data that must be shipped between database partitions.
Similar techniques can be used to move predicates or XPath filters inside of an XQuery. The basic strategy
is always to move filtering expressions as close to the data source as possible. This optimization
technique is called predicate pushdown in SQL and extraction pushdown (for filters and XPath extractions)
in XQuery.
Document 1 Document 2
<customer> <customer>
<name>John</name> <name>Michael</name>
<lastname>Doe</lastname> <lastname>Miller </lastname>
<date_of_birth> <date_of_birth>
1976-10-10 1975-01-01
</date_of_birth> </date_of_birth>
<address> <address>
<zip>95141.0</zip> <zip>95142.0</zip>
</address> </address>
<volume>80000.0</volume> <volume>100000.00</volume>
</customer> </customer>
<customer> <customer>
<name>Jane</name> <name>Michaela</name>
<lastname>Doe</lastname> <lastname>Miller</lastname>
<date_of_birth> <date_of_birth>
1975-01-01 1980-12-23
</date_of_birth> </date_of_birth>
<address> <address>
<zip>95141.4</zip> <zip>95140.5</zip>
</address> </address>
<volume>50000.00</volume> <volume>100000</volume>
</customer> </customer>
To use possible indexes on T.XMLDOC and to filter out records that are not needed early on, the zip =
95141 predicate will be internally converted into the following equivalent XPATH filtering expression:
Because schema information for XML fragments is not used by the compiler, it cannot be assumed that
ZIP contains integers only. It is possible that there are other numeric values with a fractional part and a
corresponding double XML index on this specific XPath extraction. The XML2SQL cast would handle this
transformation by truncating the fractional part before casting the value to INTEGER. This behavior must
be reflected in the pushdown procedure, and the predicate must be changed to remain semantically
correct.
To use possible indexes on T.XMLDOC and to filter out records that are not needed early on, the
lastname = 'Miller' predicate will be internally converted into an equivalent XPATH filtering
expression. A high-level representation of this expression is: :
Trailing blanks are treated differently in SQL than in XPath or XQuery. The original SQL predicate will not
distinguish between the two customers whose last name is "Miller", even if one of them (Michael) has a
trailing blank. Consequently, both customers are returned, which would not be the case if an unchanged
predicate were pushed down.
The solution is to transform the predicate into a range filter.
• The first boundary is created by truncating all trailing blanks from the comparison value, using the
RTRIM() function.
• The second boundary is created by looking up all possible strings that are greater than or equal to
"Miller", so that all strings that begin with "Miller" are located. Therefore, the original string is replaced
with an upperbound string that represents this second boundary.
This query can be rewritten with the following implied predicate, known as a predicate for transitive
closure:
dept.mgrno = proj.respemp
The optimizer can now consider additional joins when it tries to select the best access plan for the query.
During the query rewrite stage, additional local predicates are derived on the basis of the transitivity that
is implied by equality predicates. For example, the following query returns the names of the departments
whose department number is greater than E00, and the employees who work in those departments.
Example - OR to IN transformations
Suppose that an OR clause connects two or more simple equality predicates on the same column, as in
the following example:
select *
from employee
where
deptno = 'D11' or
deptno = 'D21' or
deptno = 'E21'
If there is no index on the DEPTNO column, using an IN predicate in place of OR causes the query to be
processed more efficiently:
select *
from employee
where deptno in ('D11', 'D21', 'E21')
In some cases, the database manager might convert an IN predicate into a set of OR clauses so that index
ORing can be performed.
where
name = :hv1 and
dept = :hv2 and
years > :hv5
The first two predicates (name = :hv1 and dept = :hv2) are range-delimiting predicates, and
years > :hv5 is an index-sargable predicate.
Attention: When a jump scan or skip scan is applied, for performance purposes, years > :hv5 is
treated as both a range-delimited predicate and an index-sargable predicate.
The optimizer uses index data instead of reading the base table when it evaluates these predicates.
Index-sargable predicates reduce the number of rows that need to be read from the table, but they do not
affect the number of index pages that are accessed.
Residual predicates
Residual predicates are more expensive, in terms of I/O cost, than accessing a table. Such predicates
might:
• Use correlated subqueries
• If the collating sequences are the same, the query predicates can probably be pushed down to Db2 for
z/OS. Filtering and grouping results at the data source is usually more efficient than copying the entire
table and performing the operations locally. For this query, the predicates and the GROUP BY operation
can take place at the data source.
• If the collating sequences are not the same, both predicates cannot be evaluated at the data source.
However, the optimizer might decide to push down the salary > 50000 predicate. The range
comparison must still be done locally.
• If the collating sequences are the same, and the optimizer knows that the local Db2 server is very fast,
the optimizer might decide that performing the GROUP BY operation locally is the least expensive
approach. The predicate is evaluated at the data source. This is an example of pushdown analysis
combined with global optimization.
In general, the goal is to ensure that the optimizer evaluates functions and operators at remote data
sources. Many factors affect whether a function or an SQL operator can be evaluated at a remote data
source, including the following:
Understanding why a query is evaluated at a data source instead of by the Db2 server
Consider the following key questions when you investigate ways to increase pushdown opportunities:
• Why isn't this predicate being evaluated remotely?
Data-access methods
When it compiles an SQL or XQuery statement, the query optimizer estimates the execution cost of
different ways of satisfying the query.
Based on these estimates, the optimizer selects an optimal access plan. An access plan specifies the
order of operations that are required to resolve an SQL or XQuery statement. When an application
program is bound, a package is created. This package contains access plans for all of the static SQL and
XQuery statements in that application program. Access plans for dynamic SQL and XQuery statements are
created at run time.
There are three ways to access data in a table:
• By scanning the entire table sequentially
• By accessing an index on the table to locate specific rows
• By scan sharing
Rows might be filtered according to conditions that are defined in predicates, which are usually stated in a
WHERE clause. The selected rows in accessed tables are joined to produce the result set, and this data
might be further processed by grouping or sorting of the output.
Starting with Db2 9.7, scan sharing, which is the ability of a scan to use the buffer pool pages of another
scan, is default behavior. Scan sharing increases workload concurrency and performance. With scan
sharing, the system can support a larger number of concurrent applications, queries can perform better,
and system throughput can increase, benefiting even queries that do not participate in scan sharing. Scan
sharing is particularly effective in environments with applications that perform scans such as table scans
The following predicates could be used to limit the range of a scan that uses index IX1:
or
where
mgr = :hv1 and
name = :hv2 and
dept = :hv3
The second WHERE clause demonstrates that the predicates do not have to be specified in the order in
which the key columns appear in the index. Although the examples use host variables, other variables,
such as parameter markers, expressions, or constants, could be used instead.
In the following WHERE clause, only the predicates that reference NAME and DEPT would be used for
limiting the range of the index scan:
where
name = :hv1 and
dept = :hv2 and
salary = :hv4 and
years = :hv5
Because there is a key column (MGR) separating these columns from the last two index key columns,
the ordering would be off. However, after the range is determined by the name = :hv1 and dept
= :hv2 predicates, the other predicates can be evaluated against the remaining index key columns.
• Consider an index that was created using the ALLOW REVERSE SCANS option:
In this case, the index (INAME) is based on descending values in the CNAME column. Although the
index is defined for scans running in descending order, a scan can be done in ascending order. Use of
the index is controlled by the optimizer when creating and considering access plans.
where
name = :hv1 and
dept > :hv2 and
dept < :hv3 and
mgr < :hv4
However, in access plans that use jump scans, multiple columns with strict inequality predicates can be
considered for limiting the range of an index scan. In this example (assuming the optimizer chooses an
access plan with a jump scan), the strict inequality predicates on the DEPT, and MGR columns can be
used to limit the range. While this example focuses on strict inequality predicates, note that the equality
predicate on the NAME column is also used to limit the range.
• Inclusive inequality predicates
The inclusive inequality operators that are used for range-limiting predicates are:
– >= and <=
where
name = :hv1 and
dept >= :hv2 and
dept <= :hv3 and
mgr <= :hv4
Suppose that :hv2 = 404, :hv3 = 406, and :hv4 = 12345. The database manager will scan the
index for departments 404 and 405, but it will stop scanning department 406 when it reaches the first
manager whose employee number (MGR column) is greater than 12345.
where
name = 'JONES' and
dept = 'D93'
order by mgr
For this query, the index might be used to order the rows, because NAME and DEPT will always be the
same values and will therefore be ordered. That is, the preceding WHERE and ORDER BY clauses are
equivalent to:
where
name = 'JONES' and
dept = 'D93'
order by name, dept, mgr
A unique index can also be used to truncate a sort-order requirement. Consider the following index
definition and ORDER BY clause:
Additional ordering on the PROJNAME column is not required, because the IX0 index ensures that
PROJNO is unique: There is only one PROJNAME value for each PROJNO value.
Jump scans
Queries against tables with composite (multi-column) indexes present a particular challenge when
designing indexes for tables. Ideally, a query's predicates are consistent with a table's composite index.
This would mean that each predicate could be used as a start-stop key, which would, in turn, reduce the
scope of the index needing to be searched. When a query contains predicates that are inconsistent with a
where
name = :hv1 and
dept = :hv2 and
mgr = :hv3 and
years = IS NOT NULL
This query contains an index gap: on the SALARY key part of the composite index (this is assuming
that the access plan contains an index scan on the composite index). The SALARY column cannot be
included as a start-stop predicate. The SALARY column is an example of an unconstrained index gap.
Note: For some queries it can be difficult to assess whether or not there are index gaps. Use the
EXPLAIN output to determine if index gaps are present.
Constrained index gap
Consider the following query against a table with the IX1 composite index defined earlier in this topic:
where
name = :hv1 and
dept = :hv2 and
mgr = :hv3 and
salary < :hv4 and
years = :hv5
This query contains an index gap on the SALARY key part of the composite index (this is assuming that the
access plan contains an index scan on the composite index). Since the SALARY column in the query is not
an equality predicate, start-stop values cannot be generated for this column. The SALARY key part
represents a constrained index gap.
To avoid poor performance in queries with index gaps, the optimizer can perform a jump scan operation.
In a jump scan operation, the index manager identifies qualifying keys for small sections of a composite
index where there are gaps, and fills these gaps with these qualifying keys. The end result is that the
index manager skips over parts of the index that will not yield any results.
Jump scan restrictions
For queries being issued where you expect a jump scan, verify that the target table has an appropriate
composite index and that the query predicates introduce an index gap. The Db2 optimizer will not
create plans with jump scans if there are no index gaps.
Jump scans do not scan the following types of indexes:
• range-clustered table indexes
• extended indexes (for example, spatial indexes)
• XML indexes
• text indexes (for Text Search)
With jump scans, a gap column with an IN-List predicate might be treated as an unconstrained gap
column. Also, in databases with Unicode Collation Algorithm (UCA) collation, jump scans might treat
LIKE predicates with host variables or parameter markers as unconstrained gaps.
Note: When evaluating queries, there can be cases where the optimizer chooses an access plan that does
not include a jump scan operation, even if index gaps are present. This would occur if the optimizer
deems an alternative to using a jump scan to be more efficient.
Index-only access
In some cases, all of the required data can be retrieved from an index without accessing the table. This is
known as index-only access. For example, consider the following index definition:
The following query can be satisfied by accessing only the index, without reading the base table:
Often, however, required columns do not appear in an index. To retrieve the data from these columns,
table rows must be read. To enable the optimizer to choose index-only access, create a unique index with
include columns. For example, consider the following index definition:
This index enforces the uniqueness of the NAME column and also stores and maintains data for the DEPT,
MGR, SALARY, and YEARS columns. In this way, the following query can be satisfied by accessing only the
index:
Be sure to consider whether the additional storage space and maintenance costs of include columns are
justified. If queries that exploit include columns are rarely executed, the costs might not be justified.
Multiple-index access
The optimizer can choose to scan multiple indexes on the same table to satisfy the predicates of a
WHERE clause. For example, consider the following two index definitions:
where
dept = :hv1 or
(job = :hv2 and
years >= :hv3)
Scanning index IX2 produces a list of record IDs (RIDs) that satisfy the dept = :hv1 predicate.
Scanning index IX3 produces a list of RIDs that satisfy the job = :hv2 and years >= :hv3
predicate. These two lists of RIDs are combined, and duplicates are removed before the table is
accessed. This is known as index ORing.
Index ORing can also be used for predicates that are specified by an IN clause, as in the following
example:
where
dept in (:hv1, :hv2, :hv3)
where
salary between 20000 and 30000 and
comm between 1000 and 3000
In this example, scanning index IX4 produces a bitmap that satisfies the salary between 20000 and
30000 predicate. Scanning IX5 and probing the bitmap for IX4 produces a list of qualifying RIDs that
satisfy both predicates. This is known as dynamic bitmap ANDing. It occurs only if the table has sufficient
cardinality, its columns have sufficient values within the qualifying range, or there is sufficient duplication
if equality predicates are used.
To realize the performance benefits of dynamic bitmaps when scanning multiple indexes, it might be
necessary to change the value of the sortheap database configuration parameter and the sheapthres
database manager configuration parameter. Additional sort heap space is required when dynamic
bitmaps are used in access plans. When sheapthres is set to be relatively close to sortheap (that is,
less than a factor of two or three times per concurrent query), dynamic bitmaps with multiple index
access must work with much less memory than the optimizer anticipated. The solution is to increase the
value of sheapthres relative to sortheap.
The optimizer does not combine index ANDing and index ORing when accessing a single table.
delete from t1
where c1 = 99
Rows
RETURN
( 1)
Cost
I/O
|
1
DELETE
( 2)
16.3893
2
/----+-----\
1 1000
IXSCAN CO-TABLE: BLUUSER
( 3) T1
9.10425 Q1
|
1
1000
INDEX: BLUUSER
UK2
Q2
Scan sharing
Scan sharing refers to the ability of one scan to exploit the work done by another scan. Examples of
shared work include disk page reads, disk seeks, buffer pool content reuse, decompression, and so on.
Heavy scans, such as table scans or multidimensional clustering (MDC) block index scans of large tables,
are sometimes eligible for sharing page reads with other scans. Such shared scans can start at an
arbitrary point in the table, to take advantage of pages that are already in the buffer pool. When a sharing
scan reaches the end of the table, it continues at the beginning and finishes when it reaches the point at
which it started. This is called a wrapping scan. Figure 25 on page 237 shows the difference between
regular scans and wrapping scans for both tables and indexes.
The scan sharing feature is enabled by default, and eligibility for scan sharing and for wrapping are
determined automatically by the SQL compiler. At run time, an eligible scan might or might not participate
in sharing or wrapping, based on factors that were not known at compile time.
Shared scanners are managed in share groups. These groups keep their members together as long as
possible, so that the benefits of sharing are maximized. If one scan is faster than another scan, the
benefits of page sharing can be lost. In this situation, buffer pool pages that are accessed by the first scan
might be cleared from the buffer pool before another scan in the share group can access them. The data
server measures the distance between two scans in the same share group by the number of buffer pool
pages that lies between them. The data server also monitors the speed of the scans. If the distance
between two scans in the same share group grows too large, they might not be able to share buffer pool
pages. To reduce this effect, faster scans can be throttled to allow slower scans to access the data pages
before they are cleared. Figure 26 on page 238 shows two sharing sets, one for a table and one for a block
index. A sharing set is a collection of share groups that are accessing the same object (for example, a
table) through the same access mechanism (for example, a table scan or a block index scan). For table
scans, the page read order increases by page ID; for block index scans, the page read order increases by
key value.
The figure also shows how buffer pool content is reused within groups. Consider scan C, which is the
leading scan of group 1. The following scans (A and B) are grouped with C, because they are close and can
likely reuse the pages that C has brought into the buffer pool.
A high-priority scanner is never throttled by a lower priority one, and might move to another share group
instead. A high priority scanner might be placed in a group where it can benefit from the work being done
by the lower priority scanners in the group. It will stay in that group for as long as that benefit is available.
By either throttling the fast scanner, or moving it to a faster share group (if the scanner encounters one),
the data server adjusts the share groups to ensure that sharing remains optimized.
You can use the db2pd command to view information about scan sharing. For example, for an individual
shared scan, the db2pd output will show data such as the scan speed and the amount of time that the
scan was throttled. For a sharing group, the command output shows the number of scans in the group and
the number of pages shared by the group.
The EXPLAIN_ARGUMENT table has new rows to contain scan-sharing information about table scans and
index scans (you can use the db2exfmt command to format and view the contents of this table).
You can use optimizer profiles to override decisions that the compiler makes about scan sharing (see
"Access types"). Such overrides are for use only when a special need arises; for example, the wrapping
hint can be useful when a repeatable order of records in a result set is needed, but an ORDER BY clause
(which might trigger a sort) is to be avoided. Otherwise, it is recommended that you not use these
optimization profiles unless requested to do so by Db2 Service.
Restrictions
The following restrictions are applicable to scenarios that involve indexes with random ordering:
• The key part of an index with random ordering cannot use range delimiting predicates to satisfy LIKE, <,
<= ,> ,>=, or IS NOT NULL predicates.
• Predicates that compare BIGINT or DECIMAL random ordered index column types to REAL or DOUBLE
values cannot be applied as start-stop keys on the random ordered index.
• Predicates that compare REAL or DOUBLE random ordered index column types to DECFLOAT values
cannot be applied as start-stop keys on the random ordered index.
6) UNIQUE: (Unique)
Cumulative Total Cost: 132.519
Cumulative CPU Cost: 1.98997e+06
...
...
Arguments:
---------
JN INPUT: (Join input leg)
INNER
UNIQKEY : (Unique Key columns)
1: Q1.C22
UNIQKEY : (Unique Key columns)
2: Q1.C21
pUNIQUE : (Uniqueness required flag)
HASHED PARTIAL
Hash join now selected by the query optimizer for a wider range of SQL queries
The query optimizer chooses between three basic join strategies when determining how to run an SQL
query that includes a join. In many cases a hash join is the most efficient method, and with this release it
can be used in more situations.
Data type mismatches
A hash join will now be considered even if the two columns in the join are not the same data type. This
is the case in all but the most extreme situations.
Expressions used in join predicate
Join predicates that contain an expression no longer restrict the join method to a nested loop join. In
this release a hash join is considered in cases where the WHERE clause contains an expression, like:
WHERE T1.C1 = UPPER(T1.C3)
In these cases the hash join is considered automatically. There is no need to change any existing SQL
queries to take advantage of this improved functionality. Note that hash joins make use of sort heap
memory.
Joins
A join is the process of combining data from two or more tables based on some common domain of
information. Rows from one table are paired with rows from another table when information in the
corresponding rows match on the basis of the joining criterion (the join predicate).
For example, consider the following two tables:
TABLE1 TABLE2
PROJ PROJ_ID PROJ_ID NAME
A 1 1 Sam
B 2 3 Joe
C 3 4 Mary
D 4 1 Sue
2 Mike
To join TABLE1 and TABLE2, such that the PROJ_ID columns have the same values, use the following SQL
statement:
In this case, the appropriate join predicate is: where x.proj_id = y.proj_id.
The query yields the following result set:
Depending on the nature of any join predicates, as well as any costs determined on the basis of table and
index statistics, the optimizer chooses one of the following join methods:
• Nested-loop join
• Merge join
• Hash join
Join methods
The optimizer can choose one of three basic join strategies when queries require tables to be joined:
nested-loop join, merge join, or hash join.
Nested-loop join
A nested-loop join is performed in one of the following two ways:
• Scanning the inner table for each accessed row of the outer table
For example, column A in table T1 and column A in table T2 have the following values:
To complete a nested-loop join between tables T1 and T2, the database manager performs the
following steps:
1. Read the first row in T1. The value for A is 2.
2. Scan T2 until a match (2) is found, and then join the two rows.
3. Repeat Step 2 until the end of the table is reached.
4. Go back to T1 and read the next row (3).
5. Scan T2 (starting at the first row) until a match (3) is found, and then join the two rows.
6. Repeat Step 5 until the end of the table is reached.
7. Go back to T1 and read the next row (3).
8. Scan T2 as before, joining all rows that match (3).
• Performing an index lookup on the inner table for each accessed row of the outer table
This method can be used if there is a predicate of the form:
where relop is a relative operator (for example =, >, >=, <, or <=) and expr is a valid expression on the
outer table. For example:
This method might significantly reduce the number of rows that are accessed in the inner table for each
access of the outer table; the degree of benefit depends on a number of factors, including the selectivity
of the join predicate.
Merge join
A merge join, sometimes known as a merge scan join or a sort merge join, requires a predicate of the form
table1.column = table2.column. This is called an equality join predicate. A merge join requires
ordered input on the joining columns, either through index access or by sorting. A merge join cannot be
used if the join column is a LONG field column or a large object (LOB) column.
In a merge join, the joined tables are scanned at the same time. The outer table of the merge join is
scanned only once. The inner table is also scanned once, unless repeated values occur in the outer table.
If repeated values occur, a group of rows in the inner table might be scanned again.
For example, column A in table T1 and column A in table T2 have the following values:
To complete a merge join between tables T1 and T2, the database manager performs the following steps:
1. Read the first row in T1. The value for A is 2.
2. Scan T2 until a match (2) is found, and then join the two rows.
3. Keep scanning T2 while the columns match, joining rows.
4. When the 3 in T2 is read, go back to T1 and read the next row.
5. The next value in T1 is 3, which matches T2, so join the rows.
6. Keep scanning T2 while the columns match, joining rows.
7. When the end of T2 is reached, go back to T1 to get the next row. Note that the next value in T1 is the
same as the previous value from T1, so T2 is scanned again, starting at the first 3 in T2. The database
manager remembers this position.
Hash join
A hash join requires one or more predicates of the form table1.columnX = table2.columnY None of
the columns can be either a LONG field column or a LOB column.
A hash join is performed as follows: First, the designated inner table is scanned and rows are copied into
memory buffers that are drawn from the sort heap specified by the sortheap database configuration
parameter. The memory buffers are divided into sections, based on a hash value that is computed on the
columns of the join predicates. If the size of the inner table exceeds the available sort heap space, buffers
from selected sections are written to temporary tables.
When the inner table has been processed, the second (or outer) table is scanned and its rows are
matched with rows from the inner table by first comparing the hash value that was computed for the
columns of the join predicates. If the hash value for the outer row column matches the hash value for the
inner row column, the actual join predicate column values are compared.
Star-schema joins
The tables that are referenced in a query are almost always related by join predicates. If two tables are
joined without a join predicate, the Cartesian product of the two tables is formed. In a Cartesian product,
every qualifying row of the first table is joined with every qualifying row of the second table. This creates a
result table that is usually very large, because its size is the cross product of the size of the two source
tables. Because such a plan is unlikely to perform well, the optimizer avoids even determining the cost of
this type of access plan.
The only exceptions occur when the optimization class is set to 9, or in the special case of star schemas.
A star schema contains a central table called the fact table, and other tables called dimension tables. The
dimension tables have only a single join that attaches them to the fact table, regardless of the query. Each
dimension table contains additional values that expand information about a particular column in the fact
table. A typical query consists of multiple local predicates that reference values in the dimension tables
and contains join predicates connecting the dimension tables to the fact table. For these queries, it might
Assuming that the ID column is a key in the EMPLOYEE table and that every employee has at most one
manager, this join avoids having to search for a subsequent matching row in the MANAGER table.
An early out join is also possible when there is a DISTINCT clause in the query. For example, consider the
following query that returns the names of car makers with models that sell for more than $30000.
The qualifying set is the set of rows from the DAILYSTOCKDATA table that satisfies the date and price
requirements and joins with a particular stock symbol from the SP500 table. If the qualifying set from the
DAILYSTOCKDATA table (for each stock symbol row from the SP500 table) is ordered as descending on
DATE, it is only necessary to return the first row from the qualifying set for each symbol, because that first
row represents the most recent date for a particular symbol. The other rows in the qualifying set are not
required.
Composite tables
When the result of joining a pair of tables is a new table (known as a composite table), this table usually
becomes the outer table of a join with another inner table. This is known as a composite outer join. In
some cases, particularly when using the greedy join enumeration technique, it is useful to make the result
of joining two tables the inner table of a later join. When the inner table of a join consists of the result of
joining two or more tables, this plan is known as a composite inner join. For example, consider the
following query:
select count(*)
from t1, t2, t3, t4
where
t1.a = t2.a and
t3.a = t4.a and
t2.z = t3.z
It might be beneficial to join table T1 and T2 (T1xT2), then join T3 and T4 (T3xT4), and finally, to select
the first join result as the outer table and the second join result as the inner table. In the final plan
( (T1xT2) x (T3xT4) ), the join result (T3xT4) is known as a composite inner join. Depending on the query
optimization class, the optimizer places different constraints on the maximum number of tables that can
be the inner table of a join. Composite inner joins are allowed with optimization classes 5, 7, and 9.
After using the REFRESH statement, you should invoke the runstats utility against the replicated table, as
you would against any other table.
The following query calculates sales by employee, the total for the department, and the grand total:
Instead of using the EMPLOYEE table, which resides on only one database partition, the database
manager uses R_EMPLOYEE, the MQT that is replicated on each of the database partitions on which the
SALES table is stored. The performance enhancement occurs because the employee information does not
have to be moved across the network to each database partition when performing the join.
Table queues
Descriptions of join techniques in a partitioned database environment use the following terminology:
• Table queue (sometimes referred to as TQ) is a mechanism for transferring rows between database
partitions, or between processors in a single-partition database.
• Directed table queue (sometimes referred to as DTQ) is a table queue in which rows are hashed to one
of the receiving database partitions.
• Broadcast table queue (sometimes referred to as BTQ) is a table queue in which rows are sent to all of
the receiving database partitions, but are not hashed.
A table queue is used to pass table data:
• From one database partition to another when using interpartition parallelism
• Within a database partition when using intrapartition parallelism
• Within a database partition when using a single-partition database
Each table queue passes the data in a single direction. The compiler decides where table queues are
required, and includes them in the plan. When the plan is executed, connections between the database
partitions initiate the table queues. The table queues close as processes end.
There are several types of table queues:
• Asynchronous table queues
These table queues are known as asynchronous, because they read rows in advance of any fetch
requests from an application. When a FETCH statement is issued, the row is retrieved from the table
queue.
Asynchronous table queues are used when you specify the FOR FETCH ONLY clause on the SELECT
statement. If you are only fetching rows, the asynchronous table queue is faster.
• Synchronous table queues
These table queues are known as synchronous, because they read one row for each FETCH statement
that is issued by an application. At each database partition, the cursor is positioned on the next row to
be read from that database partition.
Synchronous table queues are used when you do not specify the FOR FETCH ONLY clause on the
SELECT statement. In a partitioned database environment, if you are updating rows, the database
manager will use synchronous table queues.
• Merging table queues
These table queues preserve order.
• Non-merging table queues
Collocated joins
A collocated join occurs locally on the database partition on which the data resides. The database
partition sends the data to the other database partitions after the join is complete. For the optimizer to
consider a collocated join, the joined tables must be collocated, and all pairs of the corresponding
distribution keys must participate in the equality join predicates. Figure 27 on page 251 provides an
example.
The LINEITEM and ORDERS tables are both partitioned on the ORDERKEY column. The join is performed
locally at each database partition. In this example, the join predicate is assumed to be:
orders.orderkey = lineitem.orderkey.
Replicated materialized query tables (MQTs) enhance the likelihood of collocated joins.
The ORDERS table is sent to all database partitions that have the LINEITEM table. Table queue q2 is
broadcast to all database partitions of the inner table.
Neither table is partitioned on the ORDERKEY column. Both tables are hashed and sent to new database
partitions, where they are joined. Both table queue q2 and q3 are directed. In this example, the join
predicate is assumed to be: orders.orderkey = lineitem.orderkey.
Optimization strategies
The optimization strategies are dependent on the configuration of the Db2 environment. You need be
aware of this configuration while you are designing performance improvements.
For this query, the optimizer can perform a dimension block index lookup to find blocks in which the
month of March and the SE region occur. Then it can scan only those blocks to quickly fetch the result set.
Rollout deletion
When conditions permit delete using rollout, this more efficient way to delete rows from MDC tables is
used. The required conditions are:
• The DELETE statement is a searched DELETE, not a positioned DELETE (the statement does not use the
WHERE CURRENT OF clause).
• There is no WHERE clause (all rows are to be deleted), or the only conditions in the WHERE clause apply
to dimensions.
• The table is not defined with the DATA CAPTURE CHANGES clause.
• The table is not the parent in a referential integrity relationship.
• The table does not have ON DELETE triggers defined.
• The table is not used in any MQTs that are refreshed immediately.
• A cascaded delete operation might qualify for rollout if its foreign key is a subset of the table's
dimension columns.
• The DELETE statement cannot appear in a SELECT statement executing against the temporary table
that identifies the set of affected rows prior to a triggering SQL operation (specified by the OLD TABLE
AS clause on the CREATE TRIGGER statement).
During a rollout deletion, the deleted records are not logged. Instead, the pages that contain the records
are made to look empty by reformatting parts of the pages. The changes to the reformatted parts are
logged, but the records themselves are not logged.
The default behavior, immediate cleanup rollout, is to clean up RID indexes at delete time. This mode can
also be specified by setting the DB2_MDC_ROLLOUT registry variable to IMMEDIATE, or by specifying
IMMEDIATE on the SET CURRENT MDC ROLLOUT MODE statement. There is no change in the logging of
index updates, compared to a standard delete operation, so the performance improvement depends on
how many RID indexes there are. The fewer RID indexes, the better the improvement, as a percentage of
the total time and log space.
An estimate of the amount of log space that is saved can be made with the following formula:
S + 38*N - 50*P
where N is the number of records deleted, S is total size of the records deleted, including overhead such
as null indicators and VARCHAR lengths, and P is the number of pages in the blocks that contain the
deleted records. This figure is the reduction in actual log data. The savings on active log space required is
double that value, due to the saving of space that was reserved for rollback.
Alternatively, you can have the RID indexes updated after the transaction commits, using deferred
cleanup rollout. This mode can also be specified by setting the DB2_MDC_ROLLOUT registry variable to
DEFER, or by specifying DEFERRED on the SET CURRENT MDC ROLLOUT MODE statement. In a deferred
rollout, RID indexes are cleaned up asynchronously in the background after the delete commits. This
method of rollout can result in significantly faster deletion times for very large deletes, or when a number
of RID indexes exist on the table. The speed of the overall cleanup operation is increased, because during
a deferred index cleanup, the indexes are cleaned up in parallel, whereas in an immediate index cleanup,
each row in the index is cleaned up one by one. Moreover, the transactional log space requirement for the
DELETE statement is significantly reduced, because the asynchronous index cleanup logs the index
updates by index page instead of by index key.
Assume that you are only interested in customer information for the year 2000.
As Figure 33 on page 262 shows, the database server determines that only one data partition in table
space TS4 must be accessed to resolve this query.
Without table partitioning, one likely plan is index ANDing. Index ANDing performs the following tasks:
• Reads all relevant index entries from each index
• Saves both sets of row identifiers (RIDs)
• Matches RIDs to determine which occur in both indexes
• Uses the RIDs to fetch the rows
Figure 34. Optimizer decision path for both table partitioning and index ANDing
Db2 Explain
You can also use the explain facility to determine the data partition elimination plan that was chosen by
the query optimizer. The "DP Elim Predicates" information shows which data partitions are scanned to
resolve the following query:
Arguments:
---------
DPESTFLG: (Number of data partitions accessed are Estimated)
FALSE
DPLSTPRT: (List of data partitions accessed)
9-11
DPNUMPRT: (Number of data partitions accessed)
3
DP Elim Predicates:
------------------
Range 1)
Stop Predicate: (Q1.A <= '01/01/2001')
Start Predicate: ('12/31/1999' <= Q1.A)
Schema: MRSRINI
Name: CUSTLIST
Type: Data Partitioned Table
Time of creation: 2005-11-30-14.21.33.857039
Last statistics update: 2005-11-30-14.21.34.339392
Number of columns: 3
Number of rows: 100000
Width of rows: 19
Number of buffer pool pages: 1200
Number of data partitions: 12
Distinct row values: No
Tablespace name: <VARIOUS>
The query optimizer deduces that only data partitions in TS1, TS2, and TS3 must be accessed to resolve
this query.
Note: In the case where multiple columns make up the table partitioning key, data partition elimination is
only possible when you have predicates on the leading columns of the composite key, because the non-
leading columns that are used for the table partitioning key are not independent.
Multi-range support
It is possible to obtain data partition elimination with data partitions that have multiple ranges (that is,
those that are ORed together). Using the SALES table that was created in the previous example, execute
the following query:
The database server only accesses data for the first quarter of 2001 and the last quarter of 2002.
Generated columns
You can use generated columns as table partitioning keys. For example:
In this case, predicates on the generated column are used for data partition elimination. In addition, when
the expression that is used to generate the columns is monotonic, the database server translates
predicates on the source columns into predicates on the generated columns, which enables data partition
elimination on the generated columns. For example:
The database server generates an extra predicate on b (b > 7) from a (a > 35), thus allowing data partition
elimination.
Join predicates
Join predicates can also be used in data partition elimination, if the join predicate is pushed down to the
table access level. The join predicate is only pushed down to the table access level on the inner join of a
nested loop join (NLJN).
Consider the following tables:
In this example, the exact data partitions that will be accessed at compile time cannot be determined,
due to unknown outer values of the join. In this case, as well as cases where host variables or parameter
markers are used, data partition elimination occurs at run time when the necessary values are bound.
During run time, when T1 is the inner of an NLJN, data partition elimination occurs dynamically, based on
the predicates, for every outer value of T2.A. During run time, the predicates T1.A = 3 and T1.B > 15 are
applied for the outer value T2.A = 3, which qualifies the data partitions in table space TS6 to be accessed.
Suppose that column A in tables T1 and T2 have the following values:
Outer table T2: column Inner table T1: column Inner table T1: column Inner table T1: data
A A B partition location
2 3 20 TS6
3 2 10 TS3
3 2 18 TS4
3 15 TS6
1 40 TS3
To perform a nested loop join (assuming a table scan for the inner table), the database manager performs
the following steps:
1. Reads the first row from T2. The value for A is 2.
2. Binds the T2.A value (which is 2) to the column T2.A in the join predicate T1.A = T2.A. The predicate
becomes T1.A = 2.
3. Applies data partition elimination using the predicates T1.A = 2 and T1.B > 15. This qualifies data
partitions in table space TS4.
4. After applying T1.A = 2 and T1.B > 15, scans the data partitions in table space TS4 of table T1 until a
row is found. The first qualifying row found is row 3 of T1.
5. Joins the matching row.
6. Scans the data partitions in table space TS4 of table T1 until the next match (T1.A = 2 and T1.B > 15)
is found. No more rows are found.
7. Repeats steps 1 through 6 for the next row of T2 (replacing the value of A with 3) until all the rows of
T2 have been processed.
The optimizer can immediately eliminate the first two partitions based on the predicate a > 21. If the
nonpartitioned index over XML data on column B is chosen by the optimizer in the query plan, an index
scan using the index over XML data will be able to take advantage of the data partition elimination result
from the optimizer and only return results belonging to partitions that were not eliminated by the
relational data partition elimination predicates.
The following queries can take advantage of the precomputed values in the dba.pg_salessum MQT:
• Sales by month and product group
• Total sales for the years after 1990
• Sales for 1995 or 1996
• The sum of sales for a specific product group or product line
• The sum of sales for a specific product group or product line in 1995 and 1996
• The sum of sales for a specific country, region, or territory
Example of a query that returns the total sales for 1995 and 1996
The following query obtains significant performance improvements because it uses the aggregated
data in the dba.pg_salessum MQT.
Explain facility
The Db2 explain facility provides detailed information about the access plan that the optimizer chooses
for an SQL or XQuery statement.
The information provided describes the decision criteria that are used to choose the access plan. The
information can also help you to tune the statement or your instance configuration to improve
performance. More specifically, explain information can help you with the following tasks:
• Understanding how the database manager accesses tables and indexes to satisfy your query.
• Evaluating your performance-tuning actions. After altering a statement or making a configuration
change, examine the new explain information to determine how your action has affected performance.
The captured information includes the following information:
• The sequence of operations that were used to process the query
• Cost information
• Predicates and selectivity estimates for each predicate
• Statistics for all objects that were referenced in the SQL or XQuery statement at the time that the
explain information was captured
• Values for host variables, parameter markers, or special registers that were used to reoptimize the SQL
or XQuery statement
The explain facility is invoked by issuing the EXPLAIN statement, which captures information about the
access plan chosen for a specific explainable statement and writes this information to explain tables. You
must create the explain tables prior to issuing the EXPLAIN statement. You can also set CURRENT
EXPLAIN MODE or CURRENT EXPLAIN SNAPSHOT, special registers that control the behavior of the
explain facility.
For privileges and authorities that are required to use the explain utility, see the description of the
EXPLAIN statement. The EXPLAIN authority can be granted to an individual who requires access to
explain information but not to the data that is stored in the database. This authority is a subset of the
database administrator authority and has no inherent privilege to access data stored in tables.
To display explain information, you can use a command-line tool. The tool that you use determines how
you set the special registers that control the behavior of the explain facility. If you expect to perform
detailed analysis with one of the command-line utilities or with custom SQL or XQuery statements against
the explain tables, capture all explain information.
In IBM Data Studio Version 3.1 or later, you can generate a diagram of the current access plan for an SQL
or XPATH statement. For more details, see Diagramming access plans with Visual Explain.
Object statistics
The explain facility records information about each object, such as the following:
• The creation time
• The last time that statistics were collected for the object
• Whether or not the data in the object is sorted (only table or index objects)
• The number of columns in the object (only table or index objects)
• The estimated number of rows in the object (only table or index objects)
• The number of pages that the object occupies in the buffer pool
• The total estimated overhead, in milliseconds, for a single random I/O to the specified table space
where the object is stored
• The estimated transfer rate, in milliseconds, to read a 4-KB page from the specified table space
• Prefetch and extent sizes, in 4-KB pages
• The degree of data clustering within the index
• The number of leaf pages that are used by the index for this object, and the number of levels in the tree
• The number of distinct full key values in the index for this object
• The total number of overflow records in the table
Operator properties
The following information that describes the properties of each operator is recorded by the explain
facility:
• The set of tables that have been accessed
• The set of columns that have been accessed
• The columns on which the data is ordered, if the optimizer has determined that this ordering can be
used by subsequent operators
• The set of predicates that have been applied
• The estimated number of rows that will be returned (cardinality)
Statement identification
More than one statement might have been explained for each explain instance. In addition to information
that uniquely identifies the explain instance, the following information helps to identify individual query
statements:
• The type of statement: SELECT, DELETE, INSERT, UPDATE, positioned DELETE, positioned UPDATE, or
SET INTEGRITY
• The statement and section number of the package issuing the statement, as recorded in the
SYSCAT.STATEMENTS catalog view
The QUERYTAG and QUERYNO fields in the EXPLAIN_STATEMENT table contain identifiers that are set as
part of the explain process. When EXPLAIN MODE or EXPLAIN SNAPSHOT is active, and dynamic explain
statements are submitted during a command line processor (CLP) or call-level interface (CLI) session, the
QUERYTAG value is set to "CLP" or "CLI", respectively. In this case, the QUERYNO value defaults to a
number that is incremented by one or more for each statement. For all other dynamic explain statements
that are not from the CLP or CLI, or that do not use the EXPLAIN statement, the QUERYTAG value is set to
blanks and QUERYNO is always 1.
Cost estimation
For each explained statement, the optimizer records an estimate of the relative cost of executing the
chosen access plan. This cost is stated in an invented relative unit of measure called a timeron. No
estimate of elapsed times is provided, for the following reasons:
• The query optimizer does not estimate elapsed time but only resource consumption.
• The optimizer does not model all factors that can affect elapsed time. It ignores factors that do not
affect the efficiency of the access plan. A number of runtime factors affect the elapsed time, including
the system workload, the amount of resource contention, the amount of parallel processing and I/O, the
cost of returning rows to the user, and the communication time between the client and server.
Procedure
The first part of this procedure shows how to identify the most CPU-intensive statement. Then, it shows
how to use the EXPLAIN_FROM_SECTION procedure to view the access plan information for that
statement as it will actually run.
1. Identify the statement that using the most processor time:
SELECT SECTION_TYPE,
CASE
WHEN SUM(NUM_COORD_EXEC_WITH_METRICS) > 0 THEN
SUM(TOTAL_CPU_TIME)/SUM(NUM_COORD_EXEC_WITH_METRICS)
ELSE
0
END as AVG_CPU_TIME,
EXECUTABLE_ID,
VARCHAR(STMT_TEXT, 200) AS TEXT
FROM TABLE(MON_GET_PKG_CACHE_STMT ( 'D', NULL, NULL, -2)) as T
WHERE T.NUM_EXEC_WITH_METRICS <> 0 AND STMT_TYPE_ID LIKE 'DML%'
GROUP BY SECTION_TYPE, EXECUTABLE_ID, VARCHAR(STMT_TEXT, 200)
ORDER BY AVG_CPU_TIME DESC
The preceding SQL is written to avoid division by 0 when calculating the average processor time across
members. It also examines DML statements only, since the explain facility does not operate on DDL
statements.
The results of this query are as follows:
SECTION_TYPE AVG_CPU_TIME EXECUTABLE_ID
TEXT
------------ -------------------- -------------------------------------------------------------------
--------------------------------------------------
D 250000 x'01000000000000005F0000000000000000000000020020101108135629359000' select cust_last_name,
cust_cc_number, cust_intere
SQL0445W Value "SELECT POLICY FROM SYSTOOLS.POLICY WHERE MED='DB2TableMainte"
has been truncated. SQLSTATE=01004
2. Based on the output of the preceding query, use the EXPLAIN_FROM_SECTION procedure to generate
explain information from the section for the most CPU-intensive statement:
CALL EXPLAIN_FROM_SECTION (x'01000000000000005F0000000000000000000000020020101108135629359000' ,'M', NULL, 0,
NULL, ?, ?, ?, ?, ? )
3. You can now examine the explain information, either by examining the explain tables using SQL, or
using the db2exfmt command to format the information for easier reading.
For example, running db2exfmt -d gsdb -e db2docs -w 2010-11-08-13.57.52.984001 -n
SQLC2H21 -s NULLID -t -#0 against the explain information collected from the previous step
generates the following output:
DB2_VERSION: 09.07.2
SOURCE_NAME: SQLC2H21
SOURCE_SCHEMA: NULLID
SOURCE_VERSION:
EXPLAIN_TIME: 2010-11-08-13.57.52.984001
EXPLAIN_REQUESTER: DB2DOCS
Database Context:
----------------
Parallelism: None
CPU Speed: 8.029852e-007
Comm Speed: 100
Buffer Pool size: 21418
Sort Heap size: 6590
Database Heap size: 1196
Lock List size: 21386
Maximum Lock List: 97
Average Applications: 1
Locks Available: 663821
Package Context:
---------------
SQL Type: Dynamic
Optimization Level: 5
Blocking: Block All Cursors
Original Statement:
------------------
select cust_last_name, cust_cc_number, cust_interest_code
from gosalesct.cust_crdt_card C, gosalesct.cust_customer D,
gosalesct.cust_interest E
where C.cust_code=d.cust_code AND c.cust_code=e.cust_code
group by d.cust_last_name, c.cust_cc_number, e.cust_interest_code
order by d.cust_last_name ASC, c.cust_cc_number DESC, e.cust_interest_code
ASC
Optimized Statement:
-------------------
SELECT Q5.CUST_LAST_NAME AS "CUST_LAST_NAME", Q5.CUST_CC_NUMBER AS
"CUST_CC_NUMBER", Q5.CUST_INTEREST_CODE AS "CUST_INTEREST_CODE"
FROM
(SELECT Q4.CUST_LAST_NAME, Q4.CUST_CC_NUMBER, Q4.CUST_INTEREST_CODE
FROM
(SELECT Q2.CUST_LAST_NAME, Q3.CUST_CC_NUMBER, Q1.CUST_INTEREST_CODE
FROM GOSALESCT.CUST_INTEREST AS Q1, GOSALESCT.CUST_CUSTOMER AS Q2,
GOSALESCT.CUST_CRDT_CARD AS Q3
WHERE (Q3.CUST_CODE = Q1.CUST_CODE) AND (Q1.CUST_CODE = Q2.CUST_CODE))
AS Q4
GROUP BY Q4.CUST_INTEREST_CODE, Q4.CUST_CC_NUMBER, Q4.CUST_LAST_NAME) AS
Q5
ORDER BY Q5.CUST_LAST_NAME, Q5.CUST_CC_NUMBER DESC, Q5.CUST_INTEREST_CODE
Access Plan:
-----------
Total Cost: 1255.29
Query Degree: 1
Rows
RETURN
( 1)
Cost
I/O
|
31255
GRPBY
( 2)
1255.29
NA
|
31255
TBSCAN
( 3)
1249.02
NA
|
31255
SORT
( 4)
1242.74
NA
|
31255
^HSJOIN
( 5)
1134.96
NA
/---------+---------\
31255 31255
HSJOIN TBSCAN
( 6) ( 9)
406.871 716.136
NA NA
/------+-------\ |
⋮
Objects Used in Access Plan:
---------------------------
Schema: SYSIBM
Name: SQL101108113609000
Type: Index
Last statistics update: 2010-11-08-13.29.58.531000
Number of rows: -1
Number of buffer pool pages: -1
Distinct row values: Yes
Tablespace name: GOSALES_TS
Tablespace overhead: 7.500000
Tablespace transfer rate: 0.060000
Prefetch page count: 32
Container extent page count: 32
Index clustering statistic: 1.000000
Index leaf pages: 37
Index tree levels: 2
Index full key cardinality: 31255
Base Table Schema: GOSALESCT
Base Table Name: CUST_INTEREST
Columns in index:
CUST_CODE(A)
CUST_INTEREST_CODE(A)
Schema: GOSALESCT
Name: CUST_CRDT_CARD
Type: Table
Last statistics update: 2010-11-08-11.59.58.531000
Number of rows: 31255
Number of buffer pool pages: 192
Distinct row values: No
Tablespace name: GOSALES_TS
Tablespace overhead: 7.500000
Tablespace transfer rate: 0.060000
Prefetch page count: 32
Container extent page count: 32
Table overflow record count: 0
Table Active Blocks: -1
Average Row Compression Ratio: 0
Percentage Rows Compressed: 0
Average Compressed Row Size: 0
Schema: GOSALESCT
Name: CUST_CUSTOMER
Type: Table
Last statistics update: 2010-11-08-11.59.59.437000
Number of rows: 31255
Number of buffer pool pages: 672
Distinct row values: No
Tablespace name: GOSALES_TS
Tablespace overhead: 7.500000
Tablespace transfer rate: 0.060000
Prefetch page count: 32
Container extent page count: 32
Table overflow record count: 0
Table Active Blocks: -1
Average Row Compression Ratio: 0
Percentage Rows Compressed: 0
Average Compressed Row Size: 0
Schema: GOSALESCT
Name: CUST_INTEREST
Time of creation: 2010-11-08-11.30.28.203002
Last statistics update: 2010-11-08-13.29.58.531000
Number of rows: 31255
(The preceding output has had several lines removed for presentation purposes.)
What to do next
Analyze the explain output to see where there are opportunities to tune the query.
EXPLAIN_INSTANCE table
The following columns are set differently for the row generated by a section explain:
• EXPLAIN_OPTION is set to value S
• SNAPSHOT_TAKEN is always set to N
• REMARKS is always NULL
EXPLAIN_STATEMENT table
When a section explain has generated an explain output, the EXPLAIN_LEVEL column is set to value S. It
is important to note that the EXPLAIN_LEVEL column is part of the primary key of the table and part of the
foreign key of most other EXPLAIN tables; hence, this EXPLAIN_LEVEL value will also be present in those
other tables.
In the EXPLAIN_STATEMENT table, the remaining column values that are usually associated with a row
with EXPLAIN_LEVEL = P, are instead present when EXPLAIN_LEVEL = S, with the exception of
SNAPSHOT. SNAPSHOT is always NULL when EXPLAIN_LEVEL is S.
If the original statement was not available at the time the section explain was generated (for example, if
the statement text was not provided to the EXPLAIN_FROM_DATA procedure), STATEMENT_TEXT is set
to the string UNKNOWN when EXPLAIN_LEVEL is set to O.
In the db2exfmt output for a section explain, the following extra line is shown after the optimized
statement:
EXPLAIN_OPERATOR table
Considering all of the columns recording a cost, only the TOTAL_COST and FIRST_ROW_COST columns
are populated with a value after a section explain. All the other columns recording cost have a value of -1.
In the db2exfmt output for a section explain, the following differences are obtained:
• In the access plan graph, the I/O cost is shown as NA
EXPLAIN_PREDICATE table
No differences.
EXPLAIN_ARGUMENT table
A small number of argument types are not written to the EXPLAIN_ARGUMENT table when a section
explain is issued.
EXPLAIN_STREAM table
The following columns do not have values after a section explain:
• SINGLE_NODE
• PARTITION_COLUMNS
• SEQUENCE_SIZES
The following column always has a value of -1 after a section explain:
• PREDICATE_ID
The following columns will have values only for streams originating from a base table object or default to
no value and -1 respectively after a section explain:
• COLUMN_NAMES
• COLUMN_COUNT
In the db2exfmt output for a section explain, the information from these listed columns is omitted from
the Input Streams and Output Streams section for each operator when they do not have values, or
have a value of -1.
EXPLAIN_OBJECT table
After issuing a section explain, the STATS_SRC column is always set to an empty string and the
CREATE_TIME column is set to NULL.
The following columns always have values of -1 after a section explain:
• COLUMN_COUNT
• WIDTH
• FIRSTKEYCARD
• FIRST2KEYCARD
• FIRST3KEYCARD
• FIRST4KEYCARD
• SEQUENTIAL_PAGES
• DENSITY
• AVERAGE_SEQUENCE_GAP
• AVERAGE_SEQUENCE_FETCH_GAP
• AVERAGE_SEQUENCE_PAGES
• AVERAGE_SEQUENCE_FETCH_PAGES
• AVERAGE_RANDOM_PAGES
• AVERAGE_RANDOM_FETCH_PAGES
• NUMRIDS
• NUMRIDS_DELETED
To enable section actuals for a specific application, use the WLM_SET_CONN_ENV procedure and specify
BASE for the section_actuals element. For example:
CALL WLM_SET_CONN_ENV(NULL,
'<collectactdata>WITH DETAILS, SECTION</collectactdata>
<collectsectionactuals>BASE</collectsectionactuals>
')
Note:
1. The setting of the section_actuals database configuration parameter that was in effect at the start
of the unit of work is applied to all statements in that unit of work. When the section_actuals
database configuration parameter is changed dynamically, the new value will not be seen by an
application until the next unit of work.
2. The section_actuals setting specified by the WLM_SET_CONN_ENV procedure for an application
takes effect immediately. Section actuals will be collected for the next statement issued by the
application.
3. Section actuals cannot be enabled if automatic statistics profile generation is enabled (SQLCODE
-5153).
In a partitioned database environment, section actuals are captured by an activity event monitor on all
partitions where the activity was executed, if the statement being executed has a COLLECT ACTIVITY
DATA clause applied to it and the COLLECT ACTIVITY DATA clause specifies both the SECTION keyword
and the ON ALL DATABASE PARTITIONS clause. If the ON ALL DATABASE PARTITIONS clause is not
specified, then actuals are captured on only the coordinator partition. In addition, besides the COLLECT
ACTIVITY DATA clause on a workload, service class, threshold, or work action, activity collection can be
enabled (for an individual application) using the WLM_SET_CONN_ENV procedure with a second
argument that includes the collectactdata tag with a value of "WITH DETAILS, SECTION".
Limitations
The limitations, with respect to the capture of section actuals, are the following:
• Section actuals will not be captured when the WLM_CAPTURE_ACTIVITY_IN_PROGRESS stored
procedure is used to send information about a currently executing activity to an activity event
monitor. Any activity event monitor record generated by the
WLM_CAPTURE_ACTIVITY_IN_PROGRESS stored procedure will have a value of 1 in its
partial_record column.
• When a reactive threshold has been violated, section actuals will be captured on only the
coordinator partition.
• Explain tables must be migrated to Db2 Version 9.7 Fix Pack 1, or later, before section actuals can
be accessed using a section explain. If the explain tables have not been migrated, the section
explain will work, but section actuals information will not be populated in the explain tables. In this
case, an entry will be written to the EXPLAIN_DIAGNOSTIC table.
• Existing Db2 V9.7 activity event monitor tables (in particular, the activity table) must be recreated
before section actuals data can be captured by the activity event monitor. If the activity logical
group does not contain the SECTION_ACTUALS column, a section explain may still be performed
using a section captured by the activity event monitor, but the explain will not contain any section
actuals data.
Procedure
To investigate poor query performance for a query executed by the myApp.exe application, complete the
following steps:
1. Enable section actuals:
2. Create the EXPLAIN tables in the MYSCHEMA schema using the SYSINSTALLOBJECTS procedure:
Note: This step can be skipped if you have already created the EXPLAIN tables.
3. Create a workload MYCOLLECTWL to collect activities submitted by the myApp.exe application and
enable collection of section data for those activities by issuing the following two commands:
Followed by:
Note: Choosing to use a separate workload limits the amount of information captured by the activity
event monitor
4. Create an activity event monitor, called ACTEVMON, by issuing the following statement:
5. Activate the activity event monitor ACTEVMON by executing the following statement:
SELECT APPL_ID,
UOW_ID,
ACTIVITY_ID,
STMT_TEXT
FROM ACTIVITYSTMT_ACTEVMON
The following is an example of the output that was generated as a result of the issued select
statement:
8. Use the activity identifier information as input to the EXPLAIN_FROM_ACTIVITY procedure to obtain
a section explain with actuals, as shown in the following call statement:
Return Status = 0
9. Format the explain data using the db2exfmt command and specifying, as input, the explain instance
key that was returned as output from the EXPLAIN_FROM_ACTIVITY procedure, such as the
following:
DB2_VERSION: 09.07.1
SOURCE_NAME: SQLC2H20
SOURCE_SCHEMA: NULLID
SOURCE_VERSION:
EXPLAIN_TIME: 2009-08-24-12.33.57.525703
EXPLAIN_REQUESTER: SWALKTY
Database Context:
----------------
Parallelism: None
CPU Speed: 4.000000e-05
Comm Speed: 0
Buffer Pool size: 198224
Sort Heap size: 1278
Database Heap size: 2512
Lock List size: 6200
Maximum Lock List: 60
Average Applications: 1
Locks Available: 119040
Package Context:
---------------
SQL Type: Dynamic
Optimization Level: 5
Blocking: Block All Cursors
Isolation Level: Cursor Stability
Original Statement:
------------------
select *
from syscat.tables
Optimized Statement:
-------------------
SELECT Q10.$C67 AS "TABSCHEMA", Q10.$C66 AS "TABNAME", Q10.$C65 AS "OWNER",
Q10.$C64 AS "OWNERTYPE", Q10.$C63 AS "TYPE", Q10.$C62 AS "STATUS",
Q10.$C61 AS "BASE_TABSCHEMA", Q10.$C60 AS "BASE_TABNAME", Q10.$C59 AS
"ROWTYPESCHEMA", Q10.$C58 AS "ROWTYPENAME", Q10.$C57 AS "CREATE_TIME",
Q10.$C56 AS "ALTER_TIME", Q10.$C55 AS "INVALIDATE_TIME", Q10.$C54 AS
"STATS_TIME", Q10.$C53 AS "COLCOUNT", Q10.$C52 AS "TABLEID", Q10.$C51
AS "TBSPACEID", Q10.$C50 AS "CARD", Q10.$C49 AS "NPAGES", Q10.$C48 AS
"FPAGES", Q10.$C47 AS "OVERFLOW", Q10.$C46 AS "TBSPACE", Q10.$C45 AS
"INDEX_TBSPACE", Q10.$C44 AS "LONG_TBSPACE", Q10.$C43 AS "PARENTS",
Q10.$C42 AS "CHILDREN", Q10.$C41 AS "SELFREFS", Q10.$C40 AS
"KEYCOLUMNS", Q10.$C39 AS "KEYINDEXID", Q10.$C38 AS "KEYUNIQUE",
Q10.$C37 AS "CHECKCOUNT", Q10.$C36 AS "DATACAPTURE", Q10.$C35 AS
"CONST_CHECKED", Q10.$C34 AS "PMAP_ID", Q10.$C33 AS "PARTITION_MODE",
'0' AS "LOG_ATTRIBUTE", Q10.$C32 AS "PCTFREE", Q10.$C31 AS
"APPEND_MODE", Q10.$C30 AS "REFRESH", Q10.$C29 AS "REFRESH_TIME",
...
Access Plan:
-----------
Total Cost: 154.035
Query Degree: 1
Rows
Rows Actual
RETURN
( 1)
Cost
I/O
|
54
396
>^HSJOIN
( 2)
153.056
NA
/----------+-----------\
54 20
396 0
>^HSJOIN TBSCAN
( 3) ( 12)
140.872 11.0302
NA NA
(continued below) |
20
NA
TABLE: SYSIBM
SYSAUDITPOLICIES
...
Rows
Rows Actual
RETURN
( 1)
Cost
I/O
|
3.21948 << The estimated rows that are used by the optimizer
301 << The actuals rows that are collected in run time
DTQ
( 2)
75.3961
NA
|
3.21948
130
HSJOIN
( 3)
72.5927
NA
/--+---\
674 260
220 130
TBSCAN TBSCAN
( 4) ( 5)
40.7052 26.447
NA NA
| |
337 130
NA NA << Graph output does not include actuals for objects
In a partitioned database environment, the cardinality that is displayed in the graph is the average
cardinality for the database partitions where the actuals are collected. The average is displayed because
that is the value that is estimated by the optimizer. The actual average is a meaningful value to compare
against the estimated average. In addition, a breakdown of section actuals per database partition is
provided in the operator details output. You can examine these details to determine other information,
such as total (across all partitions), minimum, and maximum.
9) UNION : (Union)
Cumulative Total Cost: 10.6858
Cumulative First Row Cost: 9.6526
Arguments:
---------
UNIONALL: (UnionAll Parameterized Base Table)
DISJOINT
Input Streams:
-------------
5) From Operator #10
Output Streams:
--------------
8) To Operator #8
Schema: GOSALES
Name: ORDER_DETAILS
Type: Table
Member 0
---------
Metrics
-----------------
lock_wait_time:85899
lock_wait_time_global:25769
lock_waits_local:21474
lock_waits_global:85899
lock_escals_local:17179
lock_escals_global:2
direct_writes:12884
direct_read_reqs:1
pool_data_gbp_invalid_pages:446
pool_data_lbp_pages_found:445
pool_xda_l_reads:446
pool_xda_p_reads:15
Using access plans to self-diagnose performance problems with REFRESH TABLE and SET INTEGRITY
statements
Invoking the explain utility against REFRESH TABLE or SET INTEGRITY statements enables you to
generate access plans that can be used to self-diagnose performance problems with these statements.
This can help you to better maintain your materialized query tables (MQTs).
To get the access plan for a REFRESH TABLE or a SET INTEGRITY statement, use either of the following
methods:
• Use the EXPLAIN PLAN FOR REFRESH TABLE or EXPLAIN PLAN FOR SET INTEGRITY option on the
EXPLAIN statement.
• Set the CURRENT EXPLAIN MODE special register to EXPLAIN before issuing the REFRESH TABLE or
SET INTEGRITY statement, and then set the CURRENT EXPLAIN MODE special register to NO
afterwards.
Restrictions
• The REFRESH TABLE and SET INTEGRITY statements do not qualify for re-optimization; therefore, the
REOPT explain mode (or explain snapshot) is not applicable to these two statements.
• The WITH REOPT ONCE clause of the EXPLAIN statement, which indicates that the specified
explainable statement is to be re-optimized, is not applicable to the REFRESH TABLE and SET
INTEGRITY statements.
Scenario
This scenario shows how you can generate and use access plans from EXPLAIN and REFRESH TABLE
statements to self-diagnose the cause of your performance problems.
1. Create and populate your tables. For example:
create table t (
i1 int not null,
i2 int not null,
primary key (i1)
);
3. Use the db2exfmt command to format the contents of the explain tables and obtain the access plan.
This tool is located in the misc subdirectory of the instance sqllib directory.
4. Analyze the access plan to determine the cause of the performance problem. In the previous example,
if T is a large table, a table scan would be very expensive. Creating an index might improve the
performance of the query.
Rows
RETURN
( 1)
Cost
I/O
|
1
CTQ
( 2)
41.3466
6
|
1
NLJOIN
( 3)
41.3449
6
/----+-----\
1 1
CTQ TBSCAN
This plan is equivalent to a FETCH-IXSCAN combination that is used to access row-organized data. For
index access to column-organized data, row-organized data processing retrieves the rowid from the index
by using IXSCAN(5) and passes it to column-organized data processing using CTQ(4). CTQ(4) represents a
column-organized table queue that passes data from row-organized data processing to column-organized
data processing. TBSCAN(6) locates the columns that are identified by the rowid. TBSCAN(6) might apply
additional predicates if necessary, or reapply the IXSCAN predicates in some situations. Specifically, if the
table is being accessed under the UR isolation level, or the access is in support of an update or delete
operation, the TBSCAN needs to apply only those predicates that were not already applied by the
IXSCAN. Otherwise, the TBSCAN needs to reapply all of the IXSCAN predicates. NLJOIN(3) represents
the process of retrieving the rowid from row-organized data processing and passing it to the column-
organized TBSCAN.
The FETCH operator for column-organized index scans replaces the use of the previous nested-loop join
method for isolation level CS, when a modification state index exists for the table. The nested-loop join
method is still used for isolation level UR, if the following conditions are met:
• The result of the index access is joined to another column-organized table
• The index scan returns at most one row
Otherwise, the FETCH operator is used. A column-organized FETCH can process any number of rows,
while the nested-loop join representation of a fetch operation is limited to processing no more than one
row. A column-organized FETCH can apply search argument predicates (sargable) and residual predicates
just like a row-organized FETCH. A column-organized FETCH can also be used in all of the same contexts
as a row-organized FETCH, including the inner loop of a nested-loop join and in correlated subselect
queries.
The FETCH operator runs by using row-organized processing even though it is accessing column-
organized data. Data that is returned cannot be used for subsequent column-organized processing. The
Db2 query optimizer also considers accessing the column-organized table by using a table scan (TBSCAN
operator), if that is a less expensive option for processing such as:
• Joins
• Aggregation
• Removal of duplicate rows
• Sorting by using column-organized processing
The explain representation for a column-organized FETCH is similar to that of a row-organized FETCH,
except for arguments that are not applicable to column-organized processing. The example that is shown
previously for a query that uses an index in previous versions of Db2 would appear as the following when
a FETCH operator is used:
Rows
RETURN
( 1)
Cost
I/O
|
1
FETCH
( 2)
17.0787
1
Indexes on column-organized tables are not supported for the following index operations:
• Jump scans
• Deferred fetch index plans (index ANDing, ORing, and list prefetch)
• Star join and zigzag join
• Scan Sharing
Intra-partition parallel index scans are not supported for column-organized tables.
Update and delete operations that use an index scan on a column-organized table are not supported by
the FETCH operator. Update and delete operations that affect only a single row are supported by using
either index-only access or the nested-loop fetch approach.
Rows
RETURN
( 1)
Cost
I/O
|
25
LTQ
( 2)
2078.07
30
|
25
CTQ
( 3)
2074
30
|
25
^HSJOIN
( 4)
2073.79
30
/-+--\
25 25
TBSCAN TBSCAN
( 5) ( 9)
1036.85 1036.85
15 15
| |
25 25
TEMP TEMP
( 6) ( 6)
1033.24 1033.24
The execution plan includes the TEMP(6) operator, which materializes the results of the common table
expression during column-organized data processing. Operators TBSCAN(5) and TBSCAN(9) scan the
output of the TEMP(6) operator and send the data to the HSJN(4) operator. Afterward, the CTQ(3)
operator sends the results of the join operation from column-organized data processing to row-organized
data processing.
Column-organized sorts
The query optimizer determines where SORT operators are placed in the access plan based on query
semantics and costing. Column-organized sorting will be done to satisfy ORDER BY requirements on sub-
selects and within OLAP specifications. They will also be used to partition data for OLAP specifications
that include the PARTITION BY clause, to allow the OLAP function to be computed in parallel using
multiple database agents. For example:
A column-organized sort is typically executed in parallel using multiple database agents and can use
different methods to distribute the data among the agents, depending on the semantics of the SQL
statement. The type of parallel sorting method is indicated by the SORTTYPE argument of the SORT
operator along with the sort key columns and the sort partitioning columns. The SORTTYPE argument can
have the values GLOBAL, PARTITIONED, or MERGE for a column-organized SORT.
A global sort is used when the SQL statement semantics require that the data be globally ordered to
satisfy an ORDER BY request, for example. The sorting will be performed by multiple database agents but
the final result will be produced by a single agent. For example, the following query has an ORDER BY on
columns C1 and C2.
The access plan for this query has one SORT operator:
Rows
RETURN
( 1)
Cost
I/O
|
1000
LMTQ
( 2)
351.462
10
|
1000
CTQ
( 3)
289.471
10
|
1000
TBSCAN
The SORT operator arguments have SORTTYPE GLOBAL, indicating that it will produce a single stream of
sorted data. (Only arguments relevant to this discussion are shown):
Arguments:
---------
SORTKEY : (Sort Key column)
1: Q1.C1(A)
2: Q1.C2(A)
SORTTYPE: (Intra-Partition parallelism sort type)
GLOBAL
A partitioned sort is used when the sorted data can be used by multiple SQL operations, such as an
ORDER BY request and OLAP functions that require partitioned or ordered data. For example, the
following query contains 2 OLAP functions that require the data to be partitioned by (C2) and (C2,C3) and
the query also has an ORDER BY clause.
SELECT
c1,
c2,
c3,
MAX(c1) OVER (PARTITION BY c2),
MAX(c1) OVER (PARTITION BY c2, c3)
FROM
tc1
ORDER BY
c2
Rows
RETURN
( 1)
Cost
I/O
|
1000
LMTQ
( 2)
466.38
10
|
1000
CTQ
( 3)
411.188
10
|
1000
TBSCAN
( 4)
409.588
10
|
1000
SORT
( 5)
Arguments:
---------
PARTCOLS: (Table partitioning columns)
1: Q2.C2
SORTKEY : (Sort Key column)
1: Q2.C2(A)
2: Q2.C3(A)
SORTTYPE: (Intra-Partition parallelism sort type)
PARTITIONED
SORT(7) is partitioned by ranges of values of C2. Within each sort partition, the data is sorted by columns
C2 and C3 This allows both MAX functions to be computed in parallel by multiple agents. The parallel
streams with the MAX results are able to maintain order on C2, allowing the ORDER BY C2 to be satisfied
by simply merging the parallel streams. SORT(5) merges the streams from each agent to produce one
ordered stream, rather than performing a global sort of the input streams. This is indicated by SORTTYPE
MERGE in the explain arguments for SORT(5):
Arguments:
---------
SORTKEY : (Sort Key column)
1: Q3.C2(A)
SORTTYPE: (Intra-Partition parallelism sort type)
MERGE
If this same query didn't have an ORDER BY, a different type of parallel sort method is used.
SELECT
c1,
c2,
c3,
MAX(c1) OVER (PARTITION BY c2),
MAX(c1) OVER (PARTITION BY c2, c3)
FROM
tc1
Rows
RETURN
( 1)
Cost
I/O
|
1000
LTQ
( 2)
421.526
The explain arguments for SORT(5) indicate that it is a partitioned sort, however the partitioning is done
differently than the previous example:
Arguments:
---------
PARTCOLS: (Table partitioning columns)
1: Q2.C2
SORTKEY : (Sort Key column)
1: Q2.C2(R)
2: Q2.C3(A)
SORTTYPE: (Intra-Partition parallelism sort type)
PARTITIONED
Each sort output stream read by each agent contains a range of values for C2, but the data is not ordered
on C2 within each stream. Instead, the data is ordered on C3 within each distinct value of C2. For
example:
Since the window specification for both OLAP functions is PARTITION BY rather than ORDER BY, and the
outer sub-select doesn't have an ORDER BY clause, strict order on C2 and C3 does not need to be
produced by the SORT. Distinct values of C2 just need to be processed by the same database agent. For
example, all rows with values C2 = 5 must be processed by the same agent in order to properly determine
the maximum value of C1 for that group of values. Limiting the sorting to values of C3 within distinct
values of C2 reduces the memory required to perform the sort. Performance might also be improved
where:
• schema.name is the fully qualified name of the table that is being accessed
• ID is the corresponding TABLESPACEID and TABLEID from the SYSCAT.TABLES catalog view entry for
the table
Information about temporary tables includes one of the following table access statements:
where ID is the corresponding TABLESPACEID from the SYSCAT.TABLES catalog view entry for the table
(ts) or the corresponding identifier that is assigned by db2expln (tn).
After the table access statement, the following more statements are provided to further describe the
access.
• Number of columns
• Block access
• Parallel scan
• Scan direction
• Row access
• Lock intent
• Predicate
• Miscellaneous
#Columns = n
If this statement does not appear, the table was created without the ORGANIZE BY DIMENSIONS clause.
Parallel Scan
If this statement does not appear, the table is read by only one agent (or subagent).
Relation Scan
| Prefetch: None
– The following statement indicates that the optimizer determined the number of pages that are
prefetched:
Relation Scan
| Prefetch: n Pages
Relation Scan
| Prefetch: Eligible
• The following statement indicates that qualifying rows are being identified and accessed through an
index:
where:
– schema.name is the fully qualified name of the index that is being scanned
– ID is the corresponding IID column in the SYSCAT.INDEXES catalog view
– Index type is one of:
This is followed by one line of output for each column in the index. Valid formats for this information are
as follows:
n: column_name (Ascending)
n: column_name (Descending)
n: column_name (Include Column)
The following statements are provided to clarify the type of index scan.
– The range-delimiting predicates for the index are shown by the following statements:
#Key Columns = n
| Start Key: xxxxx
| Stop Key: xxxxx
Only the first 20 characters of a literal string are displayed. A string is longer than 20 characters is
indicated by an ellipsis (...) at the end of the string. Some keys cannot be determined until the
section is run and is indicated by a question mark (?) as the value.
– Index-Only Access
If all of the needed columns can be obtained from the index key, this statement displays and no table
data are accessed.
– The following statement indicates that no prefetching of index pages is done:
– The following statement indicates that for index prefetching sequential detection prefetching is
enabled and it shows the MAXPAGES value for this type of prefetching denoted by x:
– The following statement indicates that for index prefetching readahead prefetching is enabled:
– The following statement indicates that for index prefetching sequential detection and readahead
prefetching are enabled. It also shows the MAXPAGES value for sequential detection prefetching that
is denoted by x:
– The following statement indicates that for data prefetching sequential detection prefetching is
enabled and it shows the MAXPAGES value for this type of prefetching denoted by x:
– The following statement indicates that for data prefetching readahead prefetching is enabled:
– The following statement indicates that for data prefetching sequential detection and readahead
prefetching are enabled. It also shows the MAXPAGES value for sequential detection prefetching that
is denoted by x:
– If there are predicates that can be passed to the index manager to help qualify index entries, the
following statement is used to show the number of these predicates:
• When a statement indicates that qualifying rows are being identified and accessed through an index
with an expression-based key, db2explain shows detailed information about the index.
– This is followed by one line of output for each column in the index. Valid formats for this information
are as follows:
For example, if an index is created using the expression upper(name), salary+bonus, id, then the
expression return the following db2expln output:
| | Index Columns:
| | | 1: K00[UPPER(NAME)] (Ascending)
| | | 2: K01[SALARY+BONUS] (Ascending)
| | | 3: ID (Ascending)
• If the qualifying rows are being accessed through row IDs (RIDs) that were prepared earlier in the
access plan, this is indicated by the following statement:
If the table has one or more block indexes that are defined on it, rows can be accessed by either block
or row IDs. This is indicated by the following statement:
Lock Intents
| Table: xxxx
| Row : xxxx
Predicate statements
There are three types of statement that provide information about the predicates that are used in an
access plan.
• The following statement indicates the number of predicates that are evaluated for each block of data
that is retrieved from a blocked index:
Block Predicates(s)
| #Predicates = n
Sargable Predicate(s)
| #Predicates = n
• The following statement indicates the number of predicates that will be evaluated after the data is
returned:
Residual Predicate(s)
| #Predicates = n
The number of predicates that are shown in these statements might not reflect the number of predicates
that are provided in the query statement, because predicates can be:
• Applied more than once within the same query
• Transformed and extended with the addition of implicit predicates during the query optimization
process
• Transformed and condensed into fewer predicates during the query optimization process
Single Record
• The following statement appears when the isolation level that is used for table access is different from
the isolation level for the statement:
• The following statement indicates that the table has the volatile cardinality attribute set:
Volatile Cardinality
Insert Into Global Temp Table ID = ts,tn --> declared global temporary table
Insert Into Shared Global Temp Table ID = ts,tn --> declared global temporary table
The ID is an identifier that is assigned by db2expln for convenience when referring to the temporary
table. This ID is prefixed with the letter 't' to indicate that the table is a temporary table.
Each of these statements is followed by:
#Columns = n
which indicates how many columns there are in each row that is being inserted into the temporary table.
One of the following lines is displayed for each column in the sort key:
• The following statements provide estimates of the number of rows and the row size so that the optimal
sort heap can be allocated at run time:
• The following statement is displayed if only the first rows of the sorted result are needed:
• For sorts that are performed in a symmetric multiprocessor (SMP) environment, the type of sort that is
to be performed is indicated by one of the following statements:
• The following statements indicate whether or not the sorted result will be left in the sort heap:
Piped
Not Piped
Duplicate Elimination
• If aggregation is being performed during the sort operation, one of the following statements is
displayed:
Partial Aggregation
Intermediate Aggregation
Buffered Partial Aggregation
Buffered Intermediate Aggregation
Table functions
Table functions are user-defined functions (UDFs) that return data to the statement in the form of a table.
A table function is indicated by the following statements, which detail the attributes of the function. The
specific name uniquely identifies the table function that is invoked.
Hash Join
Merge Join
Nested Loop Join
In the case of a merge or nested loop join, the outer table of the join is the table that was referenced in
the previous access statement (shown in the output). The inner table of the join is the table that was
referenced in the access statement that is contained within the scope of the join statement. In the case of
a hash join, the access statements are reversed: the outer table is contained within the scope of the join,
and the inner table appears before the join.
In the case of a hash or merge join, the following additional statements might appear:
It is possible to apply predicates after a join has completed. This statement displays the number of
predicates being applied.
In the case of a hash join, the following additional statements might appear:
The hash table is built from the inner table. This statement displays if the building of the hash table was
pushed down into a predicate during access to the inner table.
• Process Probe Table For Hash Join
While accessing the outer table, a probe table can be built to improve the performance of the join. This
statement displays if a probe table was built during access to the outer table.
• Estimated Build Size: n
This statement displays the estimated number of bytes that are needed to build the hash table.
• Estimated Probe Size: n
This statement displays the estimated number of bytes that are needed to build the probe table.
In the case of a nested loop join, the following statement might appear immediately after the join
statement:
Piped Inner
This statement indicates that the inner table of the join is the result of another series of operations. This is
also referred to as a composite inner.
If a join involves more than two tables, the explain steps should be read from top to bottom. For example,
suppose the explain output has the following flow:
Access ..... W
Join
| Access ..... X
Join
| Access ..... Y
Join
| Access ..... Z
Data Stream n
All operations between these statements are considered to be part of the same data stream.
A data stream has a number of characteristics, and one or more statements can follow the initial data
stream statement to describe these characteristics:
• If the operation of the data stream depends on a value that is generated earlier in the access plan, the
data stream is marked with:
Correlated
• Similar to a sorted temporary table, the following statements indicate whether or not the results of the
data stream will be kept in memory:
Piped
Not Piped
A piped data stream might be written to disk if there is insufficient memory at execution time. The
access plan provides for both possibilities.
• The following statement indicates that only a single record is required from this data stream:
Single Record
When a data stream is accessed, the following statement will appear in the output:
Index ORing refers to the technique of accessing more than one index and combining the results to
include the distinct identifiers that appear in any of the indexes. The optimizer considers index ORing
when predicates are connected by OR keywords or there is an IN predicate.
• Either of the following statements indicates that input data was prepared for use during list prefetching:
• Index ANDing refers to the technique of accessing more than one index and combining the results to
include the identifiers that appear in all of the accessed indexes. Index ANDing begins with either of the
following statements:
Index ANDing
Block Index ANDing
If the optimizer has estimated the size of the result set, the estimate is shown with the following
statement:
Index ANDing filter operations process identifiers and use bit filter techniques to determine the
identifiers that appear in every accessed index. The following statements indicate that identifiers were
processed for index ANDing:
If the optimizer has estimated the size of the result set for a bitmap, the estimate is shown with the
following statement:
If list prefetching can be performed for any type of identifier preparation, it will be so indicated with the
following statement:
Prefetch: Enabled
Aggregation information
Aggregation is performed on rows satisfying criteria that are represented by predicates in an SQL
statement.
If an aggregate function executes, one of the following statements appears in db2expln output:
Aggregation
Predicate Aggregation
Partial Aggregation
Partial Predicate Aggregation
Hashed Partial Aggregation
Hashed Partial Predicate Aggregation
Intermediate Aggregation
Intermediate Predicate Aggregation
Final Aggregation
Final Predicate Aggregation
Group By
Column Function(s)
Single Record
The specific column function can be derived from the original SQL statement. A single record is fetched
from an index to satisfy a MIN or MAX operation.
If predicate aggregation has been performed, there is an aggregation completion operation and
corresponding output:
Aggregation Completion
Partial Aggregation Completion
Hashed Partial Aggregation Completion
Intermediate Aggregation Completion
Final Aggregation Completion
• When running an interpartition parallel plan, the section is broken into several subsections. Each
subsection is sent to one or more database partitions to be run. An important subsection is the
coordinator subsection. The coordinator subsection is the first subsection in every plan. It acquires
control first, and is responsible for distributing the other subsections and returning results to the calling
application.
– The distribution of subsections is indicated by the following statement:
Distribute Subsection #n
– The following statement indicates that the subsection will be sent to a database partition within the
database partition group, based on the value of the columns.
Directed by Hash
| #Columns = n
| Partition Map ID = n, Nodegroup = ngname, #Nodes = n
– The following statement indicates that the subsection will be sent to a predetermined database
partition. (This is common when the statement uses the DBPARTITIONNUM() scalar function.)
– The following statement indicates that the subsection will be sent to the database partition that
corresponds to a predetermined database partition number in the database partition group. (This is
common when the statement uses the HASHEDVALUE scalar function.)
– The following statement indicates that the subsection will be sent to the database partition that
provided the current row for the application's cursor.
Directed by Position
– Either of the following statements indicates that the subsection will be executed on the coordinator
database partition.
– The following statement indicates that the subsection will be sent to all of the listed database
partitions.
– The following statement indicates that only one database partition, determined as the statement is
executing, will receive the subsection.
• Table queues are used to move data between subsections in a partitioned database environment or
between subagents in a symmetric multiprocessor (SMP) environment.
– The following statements indicate that data is being inserted into a table queue:
– For database partition table queues, the destination for rows that are inserted into the table queue is
described by one of the following statements:
Each row is sent to the coordinator database partition:
Each row is sent to every database partition on which the given subsection is running:
Each row is sent to a database partition that is based on the values in the row:
Each row is sent to a database partition that is determined while the statement is executing:
– In some situations, a database partition table queue will have to overflow some rows to a temporary
table. This possibility is identified by the following statement:
– After a table access that includes a pushdown operation to insert rows into a table queue, there is a
"completion" statement that handles rows that could not be sent immediately. In this case, one of
the following lines is displayed:
– The following statements indicate that data is being retrieved from a table queue:
These statements are always followed by the number of columns being retrieved.
#Columns = n
– If the table queue sorts the rows at the receiving end, one of the following statements appears:
Output Sorted
Output Sorted and Unique
These statements are followed by the number of keys being used for the sort operation.
#Key Columns = n
For each column in the sort key, one of the following statements is displayed:
Key n: (Ascending)
Key n: (Descending)
– If predicates will be applied to rows at the receiving end of the table queue, the following statement
appears:
Residual Predicate(s)
| #Predicates = n
• Some subsections in a partitioned database environment explicitly loop back to the start of the
subsection, and the following statement is displayed:
If predicates are applied to data that is returned from a distributed subquery, the number of predicates
being applied is indicated by the following statements:
Residual Predicate(s)
| #Predicates = n
An insert, update, or delete operation that occurs at a data source is indicated by one of the following
statements:
Data definition language (DDL) statements against a data source are split into two parts. The part that is
invoked at the data source is indicated by the following statement:
If the federated server is a partitioned database, part of the DDL statement must be run at the catalog
database partition. This is indicated by the following statement:
• If the data source is relational, the SQL for the sub-statement is displayed as follows:
SQL Statement:
statement
Nicknames Referenced:
schema.nickname ID = n
If the data source is relational, the base table for the nickname is displayed as follows:
Base = baseschema.basetable
If the data source is non-relational, the source file for the nickname is displayed as follows:
• If values are passed from the federated server to the data source before executing the sub-statement,
the number of values is indicated by the following statement:
#Input Columns: n
• If values are passed from the data source to the federated server after executing the sub-statement,
the number of values is indicated by the following statement:
#Output Columns: n
DDL Statement
SET Statement
where n is the number of columns involved in obtaining distinct rows. To retrieve distinct row values,
the rows must first be sorted to eliminate duplicates. This statement will not appear if the database
manager does not have to explicitly eliminate duplicates, as in the following cases:
– A unique index exists and all of the columns in the index key are part of the DISTINCT operation
– Duplicates can be eliminated during sorting
• The following statement appears if a partial early distinct (PED) operation was performed to remove
many, if not all, duplicates. This reduces the amount of data that must be processed later in the query
evaluation.
• The following statement appears if the next operation is dependent on a specific record identifier:
Positioned Operation
If the positioned operation is against a federated data source, the statement becomes:
This statement appears for any SQL statement that uses the WHERE CURRENT OF syntax.
• The following statement appears if there are predicates that must be applied to the result but that could
not be applied as part of another operation:
• The following statement appears if the SQL statement contains a UNION operator:
UNION
• The following statement appears if there is an operation in the access plan whose sole purpose is to
produce row values for use by subsequent operations:
Table Constructor
| n-Row(s)
Table constructors can be used for transforming values in a set into a series of rows that are then
passed to subsequent operations. When a table constructor is prompted for the next row, the following
statement appears:
• The following statement appears if there is an operation that is only processed under certain conditions:
Conditional Evaluation
| Condition #n:
| #Predicates = n
| Action #n:
Conditional evaluation is used to implement such activities as the CASE statement, or internal
mechanisms such as referential integrity constraints or triggers. If no action is shown, then only data
manipulation operations are processed when the condition is true.
• One of the following statements appears if an ALL, ANY, or EXISTS subquery is being processed in the
access plan:
ANY/ALL Subquery
EXISTS Subquery
EXISTS SINGLE Subquery
• The following statement appears if rows are being returned to the application:
If the operation was pushed down into a table access, a completion phase statement appears in the
output:
• The following statement appears if one or more large object (LOB) locators are being freed:
Statement concentrator
The statement concentrator modifies dynamic SQL statements at the database server so that similar SQL
statements can share the access plan, thus improving performance. In Db2 Version 11.5 Mod Pack 4 and
later, Db2 provides two different mechanisms for statement concentration.
1. In online transaction processing (OLTP), simple statements might repeatedly be generated with
different literal values. In such workloads, the cost of recompiling the statements can add significant
memory usage. The statement concentrator avoids this memory usage by allowing compiled
statements to be reused, regardless of the values of the literals. The memory usage that is associated
with modifying the incoming SQL statements for a OLTP workload is small for the statement
concentrator, when compared to the savings that are realized by reusing statements that are in the
package cache.
The statement concentrator is disabled by default. You can enable it for all dynamic statements in a
database by setting the stmt_conc database configuration parameter to LITERALS.
If a dynamic statement is modified as a result of statement concentration, both the original statement
and the modified statement are displayed in the explain output. The event monitor logical monitor
elements and output from the MON_GET_ACTIVITY_DETAILS table function show the original
statement if the statement concentrator modified the original statement text. Other monitoring
interfaces show only the modified statement text.
Consider the following example, in which the stmt_conc database configuration parameter is set to
LITERALS and the following two statements are issued:
These statements share the entry in the package cache, and that entry uses the following statement:
The data server provides a value for :L0 (either '000020' or '000070'), based on the literal that was
used in the original statements.
The statement concentrator requires that the length attributes for VARCHAR and VARGRAPHIC string
literals to be greater than the lengths of the string literals.
The statement concentrator might cause some built-in functions to return different result types. For
example, REPLACE can return a different type when statement concentrator is used. The WORKDEPT
column is defined as CHAR(3), the following query returns VARCHAR(3) when statement concentrator
is disabled:
When stmt_conc=LITERALS, the two string literals are replaced with parameter markers and the
return type is VARCHAR(6).
Because statement concentration alters the statement text, statement concentration impacts access
plan selection. The statement concentrator works best when similar statements in the package cache
have similar access plans. If different literal values in a statement result in different access plans or
the value of a literal makes a significant difference in plan selection and execution time (for example, if
the presence of the literal allows an expression to match an expression-based index key), then do not
enable the statement concentrator for that statement.
2. Some applications append statement comments that begin with "--" (simple SQL comments). These
comments are not relevant to the compilation of the statement nor to the access plan generated. They
may simply indicate a source application server or some other accounting information. When these
statements are processed in the dynamic SQL package cache, statements that are completely
identical, but have a different comment appended to the end of the statement, will, by default end up
as different statements in the package cache. This duplication of entries in the package cache can
reduce the capacity and hit ratio of the package cache - as well as resulting in extra compilation
overhead for statements that are effectively identical.
By setting stmt_conc to COMMENTS, Db2 will strip out the these simple SQL comments from the
statement text prior to inserting it in the dynamic SQL cache. This will ensure that all statements that
are identical aside from the contents of the comments will share a single entry in the dynamic SQL
package cache. This allows for space savings in the package cache and avoids unnecessary
compilations. The stripped comment that was removed when the statement was inserted into the
package cache is saved in memory and is available as the stmt_comments in monitoring queries (for
example MON_GET_PKG_CACHE_STMT).
For example, if the following statements were issued:
SELECT FIRSTNME, LASTNAME FROM EMPLOYEE WHERE EMPNO=? -- issued from appserver ABC at
2024-11-10-12:05
SELECT FIRSTNME, LASTNAME FROM EMPLOYEE WHERE EMPNO=? -- issued from appserver CDE at
2024-11-09-08:07
SELECT FIRSTNME, LASTNAME FROM EMPLOYEE WHERE EMPNO=? -- issued from appserver QRZ at
2024-11-11-14:12
with stmt_conc set to COMMENTS they would share a single entry in the dynamic SQL cache.
Setting stmt_conc to COMM_LIT will perform both 'literal' and 'comment' concentration.
Note: The COMMENTS and COMM_LIT options are available in Db2 Version 11.5 Mod Pack 4 and later.
Optimization classes
When you compile an SQL or XQuery statement, you can specify an optimization class that determines
how the optimizer chooses the most efficient access plan for that statement.
The optimization classes differ in the number and type of optimization strategies that are considered
during the compilation of a query. Although you can specify optimization techniques individually to
improve runtime performance for the query, the more optimization techniques that you specify, the more
time and system resources query compilation will require.
You can specify one of the following optimization classes when you compile an SQL or XQuery statement.
0
This class directs the optimizer to use minimal optimization when generating an access plan, and has
the following characteristics:
• Frequent-value statistics are not considered by the optimizer.
• Only basic query rewrite rules are applied.
• Greedy join enumeration is used.
• Only nested loop join and index scan access methods are enabled.
• List prefetch is not used in generated access methods.
• The star-join strategy is not considered.
This class should only be used in circumstances that require the lowest possible query compilation
overhead. Query optimization class 0 is appropriate for an application that consists entirely of very
simple dynamic SQL or XQuery statements that access well-indexed tables.
1
This optimization class has the following characteristics:
Procedure
To specify a query optimization class:
1. Analyze performance factors.
The number of iterations represents the number of times that you expect the statement might be
executed each time that it is compiled.
Note: After initial compilation, dynamic SQL and XQuery statements are recompiled whenever a
change to the environment requires it. If the environment does not change after a statement is
cached, subsequent PREPARE statements reuse the cached statement.
• For static SQL and XQuery statements, compare the statement run times.
Although you might also be interested in the compilation time of static SQL and XQuery statements,
the total compilation and execution time for a static statement is difficult to assess in any
meaningful context. Comparing the total times does not recognize the fact that a static statement
can be executed many times whenever it is bound, and that such a statement is generally not
bound during run time.
2. Specify the optimization class.
• Dynamic SQL and XQuery statements use the optimization class that is specified by the CURRENT
QUERY OPTIMIZATION special register. For example, the following statement sets the optimization
class to 1:
To ensure that a dynamic SQL or XQuery statement always uses the same optimization class,
include a SET statement in the application program.
If the CURRENT QUERY OPTIMIZATION special register has not been set, dynamic statements are
bound using the default query optimization class. The default value for both dynamic and static
queries is determined by the value of the dft_queryopt database configuration parameter,
whose default value is 5. The default values for the bind option and the special register are also
read from the dft_queryopt database configuration parameter.
• Static SQL and XQuery statements use the optimization class that is specified on the PREP and
BIND commands. The QUERYOPT column in the SYSCAT.PACKAGES catalog view records the
optimization class that is used to bind a package. If the package is rebound, either implicitly or by
using the REBIND PACKAGE command, this same optimization class is used for static statements.
To change the optimization class for such static SQL and XQuery statements, use the BIND
command. If you do not specify the optimization class, the data server uses the default
optimization class, as specified by the dft_queryopt database configuration parameter.
Using optimization profiles if other tuning options do not produce acceptable results
If you have followed best practices recommendations, but you believe that you are still getting less than
optimal performance, you can provide explicit optimization guidelines to the Db2 optimizer.
These optimization guidelines are contained in an XML document called the optimization profile. The
profile defines SQL statements and their associated optimization guidelines.
If you use optimization profiles extensively, they require a lot of effort to maintain. More importantly, you
can only use optimization profiles to improve performance for existing SQL statements. Following best
practices consistently can help you to achieve query performance stability for all queries, including future
ones.
In this case, an explicit optimization guideline can be used to influence the optimizer. For example:
Optimization guidelines are specified using a simple XML specification. Each element within the
OPTGUIDELINES element is interpreted as an optimization guideline by the Db2 optimizer. There is one
optimization guideline element in this example. The IXSCAN element requests that the optimizer use
index access. The TABLE attribute of the IXSCAN element indicates the target table reference (using the
exposed name of the table reference) and the INDEX attribute specifies the index.
The following example is based on the previous query, and shows how an optimization guideline can be
passed to the Db2 optimizer using an optimization profile.
<?xml version="1.0" encoding="UTF-8"?>
<OPTPROFILE VERSION="9.1.0.0">
<STMTPROFILE ID="Guidelines for SAMP Q9">
<STMTKEY SCHEMA="SAMP">
SELECT S.S_NAME, S.S_ADDRESS, S.S_PHONE, S.S_COMMENT
FROM PARTS P, SUPPLIERS S, PARTSUPP PS
WHERE P_PARTKEY = PS.PS_PARTKEY
AND S.S_SUPPKEY = PS.PS_SUPPKEY
AND P.P_SIZE = 39
AND P.P_TYPE = 'BRASS'
AND S.S_NATION = 'MOROCCO'
AND S.S_NATION IN ('MOROCCO', 'SPAIN')
AND PS.PS_SUPPLYCOST = (SELECT MIN(PS1.PS_SUPPLYCOST)
FROM PARTSUPP PS1, SUPPLIERS S1
WHERE P.P_PARTKEY = PS1.PS_PARTKEY
AND S1.S_SUPPKEY = PS1.PS_SUPPKEY
AND S1.S_NATION = S.S_NATION))
</STMTKEY>
<OPTGUIDELINES><IXSCAN TABLE="S" INDEX="I_SUPPKEY"/></OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Each STMTPROFILE element provides a set of optimization guidelines for one application statement. The
targeted statement is identified by the STMTKEY subelement. The optimization profile is then given a
schema-qualified name and inserted into the database. The optimization profile is put into effect for the
statement by specifying this name on the BIND or PRECOMPILE command.
<!--
Global optimization guidelines section.
Optional but at most one.
-->
<OPTGUIDELINES>
<MQT NAME="Test.AvgSales"/>
<MQT NAME="Test.SumSales"/>
</OPTGUIDELINES>
<!--
Statement profile section.
</OPTPROFILE>
Elements common to both the global optimization guidelines and statement profile sections
Other than the OPTGUIDELINES element the REGISTRY and STMTMATCH element are other elements
available for both of these sections:
• The REGISTRY element can set certain registry variables at either the statement or global level. The
REGISTRY element is nested in the OPTGUIDELINES element.
The REGISTRY element contains an OPTION element. The OPTION element has NAME and VALUE
attributes which are used to set the value of the named registry variable.
Procedure
To create an optimization profile:
1. Launch an XML editor. If possible, use one that has schema validation capability. The optimizer does
not perform XML validation. An optimization profile must be valid according to the current optimization
profile schema.
2. Create an XML document by using a name that makes sense to you. You might want to give it a name
that describes the scope of statements to which it applies. For example: inventory_db.xml
3. Add the XML declaration to the document. If you do not specify an encoding format, UTF-8 is
assumed. Save the document with UTF-16 encoding, if possible. The data server is more efficient
when processing this encoding.
<OPTPROFILE>
</OPTPROFILE>
What to do next
After you have the XML document created, configure the data server to use the optimization profile by
inserting the optimization profile into the SYSTOOLS.OPT_PROFILE table.
<REGISTRY>
<OPTION NAME='DB2_SELECTIVITY' VALUE='YES'/>
<OPTION NAME='DB2_REDUCED_OPTIMIZATION' VALUE='NO'/>
</REGISTRY>
To have OPTION elements apply to all statements in the application that uses this profile, include the
REGISTRY and OPTION elements in the global OPTGUIDELINES element.
To have OPTION elements apply to just a specific SQL statement, include the REGISTRY and OPTION
elements in the applicable statement-level STMTPROFILE element. Different STMTPROFILE elements
can have different OPTION element settings.
The following example shows registry variable settings at the application and statement level:
</OPTPROFILE>
Order of precedence
The example above sets the same registry variables in multiple places. In addition to these settings, the
db2set command can also be used to set registry variables. Registry variables are applied in the
following order of precedence, from highest to lowest. If both options 1 and 2 are specified, then only
option 1 will be used. Otherwise, the combined effect of all options are used. If a particular registry
variable is set using multiple options, then the setting in the lowest level will take precedence.
1. Statement level optimization profile settings, which are defined in a statement-level optimization
guideline
2. Embedded optimization guideline.
3. Overall optimization profile settings, which are defined in a global optimization guideline.
4. Registry variables set by the db2set command.
The following examples indicate which registry variable settings are used for various SQL statements
using the above optimization guidelines:
Variables in effect: DB2_REDUCED_OPTIMIZATION=NO, DB2_SELECTIVITY=YES
select t1.c1, count(*) from t1,t2 where t1.c1 = t2.c1 group by t1.c1
If the registry variables are set in different places, the registry variable with the highest precedence is the
only one displayed in the explain output.
Procedure
To configure the data server to use an optimization profile:
1. Create the optimization profile table (systools.opt_profile).
Each row of the optimization profile table can contain one optimization profile: the SCHEMA and NAME
columns identify the optimization profile, and the PROFILE column contains the text of the
optimization profile. The following example calls the SYSINSTALLOBJECTS procedure to create the
optimization profile table:
call sysinstallobjects('opt_profiles','c','','')
2. Optional: You can grant any authority or privilege on the systools.opt_profile table that satisfies
your database security requirements.
Granting authority or privilege on the systools.opt_profile table has no effect on the optimizer's
ability to read the table.
3. Create an input data file that contains the three comma-separated string values that are enclosed in
double quotation marks. The first string value is the profile schema name. The second string value is
the profile name. The third string value is the optimization profile file name.
For example, you can create an input data file named PROFILEDATA that contains the following three
string values:
5. Enable the optimization profile with the CURRENT OPTIMIZATION PROFILE special register.
For example, you can incorporate SET CURRENT OPTIMIZATION PROFILE statement in your
application:
Procedure
• To set an optimization profile within an application:
• Use the SET CURRENT OPTIMIZATION PROFILE statement anywhere within your application. For
example, the last statement in the following sequence is optimized according to the JON.SALES
optimization profile.
• If you want the optimizer to use the default optimization profile that was in effect when the
application started running, specify the null value. For example:
• If you don't want the optimizer to use optimization profiles, specify the empty string. For example:
• If you are using a call level interface (CLI) application, you can add the
CURRENTOPTIMIZATIONPROFILE parameter to the db2cli.ini file, using the UPDATE CLI
CONFIGURATION command. For example:
[SANFRAN]
CURRENTOPTIMIZATIONPROFILE=JON.SALES
Note: Any SET CURRENT OPTIMIZATION PROFILE statements in the application override this
setting.
Procedure
• You can bind an optimization profile in SQLJ or embedded SQL using APIs (for example, sqlaprep) or
the command line processor (CLP).
For example, the following code shows how to bind an inventory database optimization profile to an
inventory application from the CLP:
If you do not specify a schema name for the optimization profile, the QUALIFIER command parameter
is used as the implicit qualifier.
Procedure
To modify an optimization profile:
1. Edit the optimization profile XML file on disk, make the necessary changes, save the file to disk.
2. Validate that the changes made to the file are well formed XML as defined in the current optimization
profile schema (COPS) in the DB2OptProfile.xsd file, which is located in the misc subdirectory of the
sqllib directory.
3. Update the existing row in the SYSTOOLS.OPT_PROFILE table with the new profile.
4. Ensure the new optimization profile is used:
• If you did not create triggers to flush the optimization profile cache, issue the FLUSH OPTIMIZATION
PROFILE CACHE statement. The statement removes any versions of the optimization profile that
might be contained in the optimization profile cache.
When you flush the optimization profile cache, any dynamic statements that were prepared with the
old optimization profile are also invalidated in the dynamic plan cache.
• If you have bound an optimization profile to a package of static statements, then you will need to re-
bind the package, using the OPTPROFILE command parameter again to specify the modified
optimization profile.
Results
Any subsequent reference to the optimization profile causes the optimizer to read the new profile and to
reload it into the optimization profile cache. Statements prepared under the old optimization profile are
logically invalidated. Calls made to those statements are prepared under the new optimization profile and
recached in the dynamic plan cache.
Procedure
To delete an optimization profile:
1. Delete the optimization profile from the SYSTOOLS.OPT_PROFILE table. For example:
2. If you did not create triggers to flush the optimization profile cache, issue the FLUSH OPTIMIZATION
PROFILE CACHE statement to remove any versions of the optimization profile that might be contained
in the optimization profile cache. See “Triggers to flush the optimization profile cache” on page 387.
Note: When you flush the optimization profile cache, any dynamic statements that were prepared with
the old optimization profile are also invalidated in the dynamic plan cache.
Results
Any subsequent reference to the optimization profile causes the optimizer to return SQL0437W with
reason code 13.
This particular query rewrite optimization guideline specifies that the list of constants in the predicate
P_SIZE IN (35, 36, 39, 40) should be transformed into a table expression. This table expression
would then be eligible to drive an indexed nested-loop join access to the PARTS table in the main
subselect. The TABLE attribute is used to identify the target IN-LIST predicate by indicating the table
reference to which this predicate applies. If there are multiple IN-LIST predicates for the identified table
reference, the INLIST2JOIN rewrite request element is considered ambiguous and is ignored.
In such cases, a COLUMN attribute can be added to further qualify the target IN-LIST predicate. For
example:
The TABLE attribute of the INLIST2JOIN element identifies the PARTS table reference in the main
subselect. The COLUMN attribute is used to identify the IN-LIST predicate on the P_SIZE column as the
target. In general, the value of the COLUMN attribute can contain the unqualified name of the column
referenced in the target IN-LIST predicate. If the COLUMN attribute is provided without the TABLE
attribute, the query rewrite optimization guideline is considered invalid and is ignored.
The OPTION attribute can be used to enable or disable a particular query rewrite optimization guideline.
Because the OPTION attribute is set to DISABLE in the following example, the list of constants in the
predicate P_SIZE IN (35, 36, 39, 40) will not be transformed into a table expression. The default
value of the OPTION attribute is ENABLE. ENABLE and DISABLE must be specified in uppercase
characters.
<OPTGUIDELINES>
<INLIST2JOIN TABLE='P' COLUMN='P_SIZE' OPTION='DISABLE'/>
</OPTGUIDELINES>
In the following example, the INLIST2JOIN rewrite request element does not have a TABLE attribute. The
optimizer interprets this as a request to disable the IN-LIST-to-join query transformation for all IN-LIST
predicates in the statement.
<OPTGUIDELINES><INLIST2JOIN OPTION='DISABLE'/></OPTGUIDELINES>
The following example illustrates a subquery-to-join query rewrite optimization guideline, as represented
by the SUBQ2JOIN rewrite request element. A subquery-to-join transformation converts a subquery into
an equivalent table expression. The transformation applies to subquery predicates that are quantified by
EXISTS, IN, =SOME, =ANY, <>SOME, or <>ANY. The subquery-to-join query rewrite optimization guideline
does not ensure that a subquery will be merged. A particular subquery cannot be targeted by this query
rewrite optimization guideline. The transformation can only be enabled or disabled at the statement level.
Note: The enablement of a query transformation rule at the statement level does not ensure that the rule
will be applied to a particular part of the statement. The usual criteria are used to determine whether
query transformation will take place. For example, if there are multiple NOT EXISTS predicates in the
query block, the optimizer will not consider converting any of them into anti-joins. Explicitly enabling
query transformation at the statement level does not change this behavior.
The following example illustrates a NOT-IN-to-anti-join query rewrite optimization guideline, as
represented by the NOTIN2AJ rewrite request element. A NOT-IN-to-anti-join transformation converts a
subquery into a table expression that is joined to other tables using anti-join semantics (only
nonmatching rows are returned). The NOT-IN-to-anti-join query rewrite optimization guideline applies to
subquery predicates that are quantified by NOT IN. The NOT-IN-to-anti-join query rewrite optimization
guideline does not ensure that a subquery will be merged. A particular subquery cannot be targeted by
this query rewrite optimization guideline. The transformation can only be enabled or disabled at the
statement level.
A particular query rewrite optimization guideline might not be applicable when considered within the
context of other query rewrite transformations being applied to the statement. That is, if a guideline
request to enable a transform cannot be applied, a warning is returned. For example, an INLIST2JOIN
rewrite enable request element targeting a predicate that is eliminated from the query by another query
transformation would not be applicable. Moreover, the successful application of a query rewrite
optimization guideline might change the applicability of other query rewrite transformation rules. For
example, a request to transform an IN-LIST to a table expression might prevent a different IN-LIST from
being transformed to a table expression, because the optimizer will only apply a single IN-LIST-to-join
transformation per query block.
SQL statement:
Optimization guideline:
<OPTGUIDELINES>
<IXSCAN TABLE='S' INDEX='I_SUPPKEY'/>
</OPTGUIDELINES>
The following index scan access request element specifies that the optimizer is to use index access to the
PARTS table in the main subselect of the statement. The optimizer will choose the index in a cost-based
fashion, because there is no INDEX attribute. The TABLE attribute uses the qualified table name to refer
to the target table reference, because there is no associated correlation name.
<OPTGUIDELINES>
<IXSCAN TABLE='"Samp".PARTS'/>
</OPTGUIDELINES>
The following list prefetch access request is represented by the LPREFETCH access request element. This
particular request specifies that the optimizer is to use the I_SNATION index to access the SUPPLIERS
table in the nested subselect of the statement. The TABLE attribute uses the correlation name S1,
because that is the exposed name identifying the SUPPLIERS table reference in the nested subselect.
<OPTGUIDELINES>
<LPREFETCH TABLE='S1' INDEX='I_SNATION'/>
</OPTGUIDELINES>
The following index scan access request element specifies that the optimizer is to use the I_SNAME index
to access the SUPPLIERS table in the main subselect. The FIRST attribute specifies that this table is to be
the first table that is accessed in the join sequence chosen for the corresponding FROM clause. The FIRST
attribute can be added to any access or join request; however, there can be at most one access or join
request with the FIRST attribute referring to tables in the same FROM clause.
SQL statement:
Optimization guidelines:
<OPTGUIDELINES>
<IXSCAN TABLE='S' INDEX='I_SNAME' FIRST='TRUE'/>
</OPTGUIDELINES>
The following example illustrates how multiple access requests are passed in a single statement
optimization guideline. The TBSCAN access request element represents a table scan access request. This
particular request specifies that the SUPPLIERS table in the nested subselect is to be accessed using a
full table scan. The LPREFETCH access request element specifies that the optimizer is to use the
I_SUPPKEY index during list prefetch index access to the SUPPLIERS table in the main subselect.
<OPTGUIDELINES>
<TBSCAN TABLE='S1'/>
<LPREFETCH TABLE='S' INDEX='I_SUPPKEY'/>
</OPTGUIDELINES>
The following example illustrates a nested-loop join request, as represented by the NLJOIN join request
element. In general, a join request element contains two child elements. The first child element
represents the desired outer input to the join operation, and the second child element represents the
desired inner input to the join operation. The child elements can be access requests, other join requests,
or combinations of access and join requests. In this example, the first IXSCAN access request element
specifies that the PARTS table in the main subselect is to be the outer table of the join operation. It also
specifies that PARTS table access be performed using an index scan. The second IXSCAN access request
element specifies that the PARTSUPP table in the main subselect is to be the inner table of the join
operation. It, too, specifies that the table is to be accessed using an index scan.
<OPTGUIDELINES>
<NLJOIN>
<IXSCAN TABLE='"Samp".Parts'/>
<IXSCAN TABLE="PS"/>
</NLJOIN>
</OPTGUIDELINES>
The following example illustrates a hash join request, as represented by the HSJOIN join request
element. The ACCESS access request element specifies that the SUPPLIERS table in the nested subselect
is to be the outer table of the join operation. This access request element is useful in cases where
specifying the join order is the primary objective. The IXSCAN access request element specifies that the
PARTSUPP table in the nested subselect is to be the inner table of the join operation, and that the
optimizer is to choose an index scan to access that table.
<OPTGUIDELINES>
<HSJOIN>
<ACCESS TABLE='S1'/>
<IXSCAN TABLE='PS1'/>
</HSJOIN>
</OPTGUIDELINES>
The following example illustrates how larger join requests can be constructed by nesting join requests.
The example includes a merge join request, as represented by the MSJOIN join request element. The
outer input of the join operation is the result of joining the PARTS and PARTSUPP tables of the main
subselect, as represented by the NLJOIN join request element. The inner input of the join request
element is the SUPPLIERS table in the main subselect, as represented by the IXSCAN access request
element.
<OPTGUIDELINES>
<MSJOIN>
<NLJOIN>
If a join request is to be valid, all access request elements that are nested either directly or indirectly
inside of it must reference tables in the same FROM clause of the optimized statement.
<OPTGUIDELINES>
<MQTENFORCE NAME='SAMP.PARTSMQT'/>
<MQTENFORCE TYPE='REPLICATED'/>
</OPTGUIDELINES>
Note: If you specify more than one attribute at a time, only the first one will be used. So in the following
example
Procedure
To create statement-level optimization guidelines:
1. Create the optimization profile in which you want to insert the statement-level guidelines. See
“Creating an optimization profile” on page 331.
2. Run the explain facility against the statement to determine whether optimization guidelines would be
helpful. Proceed if that appears to be the case.
3. Obtain the original statement by running a query that is similar to the following:
4. Edit the optimization profile and create a statement profile, inserting the statement text into the
statement key.
For example:
5. Insert statement-level optimization guidelines after the statement key. Use exposed names to identify
the objects that are referenced in access and join requests.
The following is an example of a join request:
<OPTGUIDELINES>
<HSJOIN>
<TBSCAN TABLE='PS1'/>
<IXSCAN TABLE='S1'
INDEX='I1'/>
</HSJOIN>
</OPTGUIDELINES>
Results
If expected results are not achieved, make changes to the guidelines or create additional guidelines, and
update the optimization profile, as appropriate.
TABLE attribute values that identify a table reference in the statement include '"Samp".PARTS', 'PARTS',
'Parts' (because the identifier is not delimited, it is converted to uppercase characters). TABLE attribute
values that fail to identify a table reference in the statement include '"Samp2".SUPPLIERS', 'PARTSUPP'
(not an exposed name), and 'Samp.PARTS' (the identifier Samp must be delimited; otherwise, it is
converted to uppercase characters).
The exposed name can be used to target any table reference in the original statement, view, SQL function,
or trigger.
Using exposed names in the original statement to identify table references in views
Optimization guidelines can use extended syntax to identify table references that are embedded in views,
as shown in the following example:
<OPTGUIDELINES>
<IXSCAN TABLE='A/"Rick".V1/A'/>
</OPTGUIDELINES>
The IXSCAN access request element specifies that an index scan is to be used for the EMPLOYEE table
reference that is embedded in the views "Gustavo".V2 and "Rick".V1. The extended syntax for identifying
table references in views is a series of exposed names separated by a slash character. The value of the
TABLE attribute A/"Rick".V1/A illustrates the extended syntax. The last exposed name in the sequence
(A) identifies the table reference that is a target of the optimization guideline. The first exposed name in
the sequence (A) identifies the view that is directly referenced in the original statement. The exposed
name or names in the middle ("Rick".V1) pertain to the view references along the path from the direct
view reference to the target table reference. The rules for referring to exposed names from optimization
guidelines, described in the previous section, apply to each step of the extended syntax.
Had the exposed name of the EMPLOYEE table reference in the view been unique with respect to all
tables that are referenced either directly or indirectly by the statement, the extended name syntax would
not be necessary.
Original statement:
<OPTGUIDELINES>
<HSJOIN>
<TBSCAN TABLE='S1'/>
<IXSCAN TABID='Q2'/>
</HSJOIN>
</OPTGUIDELINES>
Optimized statement:
This optimization guideline shows a hash join request, where the SUPPLIERS table in the nested
subselect is the outer table, as specified by the TBSCAN access request element, and where the
PARTSUPP table in the nested subselect is the inner table, as specified by the IXSCAN access request
element. The TBSCAN access request element uses the TABLE attribute to identify the SUPPLIERS table
reference using the corresponding exposed name in the original statement. The IXSCAN access request
element, on the other hand, uses the TABID attribute to identify the PARTSUPP table reference using the
unique correlation name that is associated with that table reference in the optimized statement.
If a single optimization guideline specifies both the TABLE and TABID attributes, they must identify the
same table reference, or the optimization guideline is ignored.
Note: There is currently no guarantee that correlation names in the optimized statement will be stable
when upgrading to a new release of the Db2 product.
create view v1 as
(select * from employee
where salary > (select avg(salary) from employee)
select * from v1
where deptno in ('M62', 'M63')
<OPTGUIDE>
<IXSCAN TABLE='V1/EMPLOYEE'/>
</OPTGUIDE>
The optimizer considers the IXSCAN access request ambiguous, because the exposed name EMPLOYEE is
not unique within the definition of view V1.
To eliminate the ambiguity, the view can be rewritten to use unique correlation names, or the TABID
attribute can be used. Table references that are identified by the TABID attribute are never ambiguous,
because all correlation names in the optimized statement are unique.
<OPTGUIDELINES>
<IXSCAN TABLE='"Samp".PARTS' INDEX='I_PTYPE'/>
<IXSCAN TABLE='"Samp".PARTS' INDEX='I_SIZE'/>
</OPTGUIDELINES>
Each of the IXSCAN elements references the "Samp".PARTS table in the main subselect.
When two or more guidelines refer to the same table, only the first is applied; all other guidelines are
ignored, and an error is returned.
Only one INLIST2JOIN query rewrite request element at the predicate level per query can be
enabled.The following example illustrates an unsupported query rewrite optimization guideline, where
two IN-LIST predicates are enabled at the predicate level. Both guidelines are ignored, and a warning is
returned.
<OPTGUIDELINES>
<INLIST2JOIN TABLE='P' COLUMN='P_SIZE'/>
<INLIST2JOIN TABLE='P' COLUMN='P_TYPE'/>
</OPTGUIDELINES>
Procedure
To verify that a valid optimization guideline has been used:
1. Issue the EXPLAIN statement against the statement to which the guidelines apply.
If an optimization guideline was in effect for the statement using an optimization profile, the
optimization profile name appears as a RETURN operator argument in the EXPLAIN_ARGUMENT table.
And if the optimization guideline contained an SQL embedded optimization guideline or statement
If the optimization guideline is active and the explained statement matches the statement that is
contained in the STMTKEY element of the optimization guideline, a query that is similar to the previous
example produces output that is similar to the following output. The value of the STMTPROF argument
is the same as the ID attribute in the STMTPROFILE element.
TYPE VALUE
--------- --------------------------
OPT_PROF NEWTON.PROFILE1
STMTPROF Guidelines for SAMP Q9
Note:
1. You cannot create an index on a column-organized table, and only table scans can be used. Any table
access request that requires an index scan cannot be satisfied.
2. A column-organized table supports only HSJOIN and NLJOIN requests. Any join request that
references column-organized tables can be satisfied by retrieving the data from the tables and
performing the join by using row-organized data processing. If the requested join method is HSJOIN
or NLJOIN, a plan with HSJOIN or NLJOIN being pushed down to column-organized data processing
can also be used to satisfy the request, assuming they are eligible considering the type of join
predicates being applied. HSJOIN is only eligible if there is at least one equality join predicate, and
NLJOIN is only eligible if there are no equality join predicates.
3. A star join can contain only row-organized tables.
XML schemas for access plan and query optimization profiles and guidelines
Access plan and query optimization profiles are written as XML documents stored in the database. Each
component has a specified XML format, as defined in the schema, that must be used in order for an
optimization profile to be validated for use. You might find it helpful to use the samples as reference while
designing profiles.
<!--*****************************************************************************************-->
<!-- Global optimization guidelines supported in this version: -->
<!-- + MQTOptimizationChoices elements influence the MQTs considered by the optimizer. -->
<!-- + computationalPartitionGroupOptimizationsChoices elements can affect repartitioning -->
<!-- optimizations involving nicknames. -->
<!--****************************************************************************************-->
<!-- -->
<!-- Optimization guideline elements can be chosen from general requests, rewrite -->
<!-- requests access requests, or join requests. -->
<!-- -->
<!-- General requests affect the search space which defines the alternative query -->
<!-- transformations, access methods, join methods, join orders, and other optimizations, -->
<!-- considered by the optimizer. -->
<!-- -->
<!-- Rewrite requests affect the query transformations used in determining the optimized -->
<!-- statement. -->
<!-- -->
<!-- Access requests affect the access methods considered by the cost-based optimizer, -->
<!-- and join requests affect the join methods and join order used in the execution plan. -->
<!-- -->
<!-- MQT enforcement requests specify semantically matchable MQTs whose usage in access -->
<!-- plans should be enforced regardless of cost estimates. -->
<!-- -->
<!--****************************************************************************************-->
<xs:element name="OPTGUIDELINES" type="optGuidelinesType"/>
<xs:complexType name="optGuidelinesType">
<xs:sequence>
<xs:group ref="generalRequest" minOccurs="0" maxOccurs="1"/>
<xs:choice maxOccurs="unbounded">
<xs:group ref="rewriteRequest" />
<xs:group ref="accessRequest"/>
<xs:group ref="joinRequest"/>
<xs:group ref="mqtEnforcementRequest"/>
</xs:choice>
</xs:sequence>
</xs:complexType>
<!--************************************************************************************* -->
<!-- Choices of general request elements. -->
<!-- REOPT can be used to override the setting of the REOPT bind option. -->
<!-- DPFXMLMOVEMENT can be used to affect the optimizer's plan when moving XML documents -->
<!-- between database partitions. The value can be NONE, REFERENCE or COMBINATION. The -->
<!-- default value is NONE. -->
<!--************************************************************************************* -->
<xs:group name="generalRequest">
<xs:sequence>
<xs:element name="REOPT" type="reoptType" minOccurs="0" maxOccurs="1"/>
<xs:element name="DEGREE" type="degreeType" minOccurs="0" maxOccurs="1"/>
<xs:element name="QRYOPT" type="qryoptType" minOccurs="0" maxOccurs="1"/>
<xs:element name="RTS" type="rtsType" minOccurs="0" maxOccurs="1"/>
<xs:element name="DPFXMLMOVEMENT" type="dpfXMLMovementType" minOccurs="0" maxOccurs="1"/>
</xs:sequence>
</xs:group>
<!--***********************************************************************************-->
<!-- Choices of rewrite request elements. -->
<!--***********************************************************************************-->
<xs:group name="rewriteRequest">
<xs:sequence>
<xs:element name="INLIST2JOIN" type="inListToJoinType" minOccurs="0"/>
<xs:element name="SUBQ2JOIN" type="subqueryToJoinType" minOccurs="0"/>
<xs:element name="NOTEX2AJ" type="notExistsToAntiJoinType" minOccurs="0"/>
<xs:element name="NOTIN2AJ" type="notInToAntiJoinType" minOccurs="0"/>
</xs:sequence>
</xs:group>
<!--************************************************************************************* -->
<!-- Choices for access request elements. -->
<!-- TBSCAN - table scan access request element -->
<!-- IXSCAN - index scan access request element -->
<!-- LPREFETCH - list prefetch access request element -->
<!-- IXAND - index ANDing access request element -->
<!-- IXOR - index ORing access request element -->
<!-- XISCAN - xml index access request element -->
<!-- XANDOR - XANDOR access request element -->
<!-- ACCESS - indicates the optimizer should choose the access method for the table -->
<!--************************************************************************************* -->
<xs:group name="accessRequest">
<xs:choice>
<xs:complexType name="degreeType">
<xs:attribute name="VALUE" type="intStringType"></xs:attribute>
</xs:complexType>
<!--*****************************************************************************************-->
<!-- Definition of DPF XML movement types -->
<!--**************************************************************************************** -->
<xs:complexType name="dpfXMLMovementType">
<xs:attribute name="VALUE" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="REFERENCE"/>
<xs:enumeration value="COMBINATION"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:schema>
XML Schema
<xs:element name="OPTPROFILE">
<xs:complexType>
<xs:sequence>
<!-- Global optimization guidelines section. -->
<!-- At most one can be specified. -->
<xs:element name="OPTGUIDELINES"
type="globalOptimizationGuidelinesType" minOccurs="0"/>
<!-- Statement profile section. Zero or more can be specified -->
<xs:element name="STMTPROFILE" type="statementProfileType"
minOccurs="0" maxOccurs="unbounded"/>
</xs:sequence>
<!-- Version attribute is currently optional -->
<xs:attribute name="VERSION" use="optional"/>
</xs:complexType>
</xs:element>
Description
The optional OPTGUIDELINES sub-element defines the global optimization guidelines for the
optimization profile. Each STMTPROFILE sub-element defines a statement profile. The VERSION attribute
identifies the current optimization profile schema against which a specific optimization profile was
created and validated.
XML Schema
<xs:complexType name="globalOptimizationGuidelinesType">
<xs:sequence>
<xs:group ref="MQTOptimizationChoices"/>
<xs:group ref="computationalPartitionGroupOptimizationChoices"/>
<xs:group ref="generalRequest"/>
<xs:group ref="mqtEnforcementRequest"/>
</xs:sequence>
</xs:complexType>
Description
Global optimization guidelines can be defined with elements from the groups MQTOptimizationChoices,
computationalPartitionGroupChoices, or generalRequest.
• MQTOptimizationChoices group elements can be used to influence MQT substitution.
• computationalPartitionGroupOptimizationChoices group elements can be used to influence
computational partition group optimization, which involves the dynamic redistribution of data read from
remote data sources. It applies only to partitioned federated database configurations.
• The generalRequest group elements are not specific to a particular phase of the optimization process,
and can be used to change the optimizer's search space. They can be specified globally or at the
statement level.
• MQT enforcement requests specify semantically matchable materialized query tables (MQTs) whose
use in access plans should be enforced regardless of cost estimates.
XML Schema
<xs:group name="MQTOptimizationChoices">
<xs:choice>
<xs:element name="MQTOPT" minOccurs="0" maxOccurs="1">
<xs:complexType>
<xs:attribute name="OPTION" type="optionType" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="MQT" minOccurs="0" maxOccurs="unbounded">
<xs:complexType>
<xs:attribute name="NAME" type="xs:string" use="required"/>
</xs:complexType>
</xs:element>
</xs:choice>
</xs:group>
Description
The MQTOPT element is used to enable or disable consideration of MQT optimization. The OPTION
attribute can take the value ENABLE (default) or DISABLE.
The NAME attribute of an MQT element identifies an MQT that is to be considered by the optimizer. The
rules for forming a reference to an MQT in the NAME attribute are the same as those for forming
references to exposed table names. If one or more MQT elements are specified, only those MQTs are
considered by the optimizer. The decision to perform MQT substitution using one or more of the specified
MQTs remains a cost-based decision.
Examples
The following example shows how to disable MQT optimization.
<OPTGUIDELINES>
<MQTOPT OPTION='DISABLE'/>
</OPTGUIDELINES>
The following example shows how to limit MQT optimization to the Samp.PARTSMQT table and the
COLLEGE.STUDENTS table.
<OPTGUIDELINES>
<MQT NAME='Samp.PARTSMQT'/>
<MQT NAME='COLLEGE.STUDENTS'/>
<OPTGUIDELINES>
XML Schema
<xs:group name="computationalPartitionGroupOptimizationChoices">
<xs:choice>
<xs:element name="PARTOPT" minOccurs="0" maxOccurs="1">
<xs:complexType>
<xs:attribute name="OPTION" type="optionType" use="optional"/>
</xs:complexType>
</xs:element>
<xs:element name="PART" minOccurs="0" maxOccurs="1">
<xs:complexType>
<xs:attribute name="NAME" type="xs:string" use="required"/>
</xs:complexType>
Description
The PARTOPT element is used to enable or disable consideration of computational partition group
optimization. The OPTION attribute can take the value ENABLE (default) or DISABLE.
The PART element can be used to specify the partition group that is to be used for computational partition
group optimization. The NAME attribute must identify an existing partition group. The decision to perform
dynamic redistribution using the specified partition group remains a cost-based decision.
Examples
The following example shows how to disable computational partition group optimization.
<OPTGUIDELINES>
<PARTOPT OPTION='DISABLE'/>
</OPTGUIDELINES>
The following example shows how to specify that the WORKPART partition group is to be used for
computational partition group optimization.
<OPTGUIDELINES>
<MQT NAME='Samp.PARTSMQT'/>
<PART NAME='WORKPART'/>
<OPTGUIDELINES>
XML Schema
<xs:complexType name="statementProfileType">
<xs:sequence>
<xs:element name="STMTMATCH" type="stmtMatchType" minOccurs="0"/>
<xs:element name="STMTKEY" type="statementKeyType"/>
<xs:element name="OPTGUIDELINES" type="optGuidelinesType"/>
</xs:sequence>
<xs:attribute name="ID" type="xs:string" use="optional"/>
</xs:complexType>
Description
A statement profile specifies optimization guidelines for a particular statement, and includes the
following parts:
• Statement matching
The statements in an optimization profile are either exactly or inexactly matched to the statement or
statements that are compiled. The value of the STMTMATCH attribute represents which matching
method is applied.
• Statement key
XML Schema
<xs:complexType name="stmtMatchType">
<xs:attribute name="EXACT" type="boolType" use="optional" default="TRUE"/>
</xs:complexType>
Description
The optional EXACT attribute specifies the matching method. If the value is set to TRUE, exact matching
is applied. If the value is set to FALSE, inexact matching is applied. Exact matching is the default setting.
Example
The following example shows an STMTMATCH element definition at the statement level which enables
inexact matching for the statement in the STMTKEY element.
<STMTPROFILE ID='S1'>
<STMTMATCH EXACT='FALSE'/>
<STMTKEY>
<![CDATA[select t1.c1, count(*) from t1,t2 where t1.c1 = t2.c1 and t1.c1 >
0]]>
</STMTKEY>
...
</STMTPROFILE>
XML Schema
Description
The optional SCHEMA attribute can be used to specify the default schema part of the statement key.
Example
The following example shows a statement key definition that associates a particular statement with a
default schema of 'COLLEGE' and a function path of 'SYSIBM,SYSFUN,SYSPROC,DAVE'.
CDATA tagging (starting with <![CDATA[ and ending with ]]>) is necessary because the statement text
contains the special XML character '>'.
Inexact matching
During compilation, if there is an active optimization profile, the compiling statements are matched either
exactly or inexactly with the statements in the optimization profile.
Inexact matching is used for flexible matching between the compiling statements and the statements
within the optimization profile. Inexact matching ignores literals, host variables, and parameter markers
when matching the compiling statement to the optimization profile statements. Therefore, you can
compile many different statements with different literal values in the predicate and the statements still
match. For example, the following statements match inexactly but they do not match exactly:
Inexact matching is applied to both SQL and XQuery statements. However, string literals that are passed
as function parameters representing SQL or XQuery statements or statement fragments, including
individual column names are not inexactly matched. XML functions such as XMLQUERY, XMLTABLE, and
XMLEXISTS that are used in an SQL statement are exactly matched. String literals could contain the
following items:
• A whole statement with SQL embedded inside XQuery, or XQuery embedded inside an SQL statement
• An identifier, such as a column name
• An XML expression that contains a search path
For XQuery, inexact matching ignores only the literals. The following literals are ignored in inexact
matching with some restrictions on the string literals:
• decimal literals
• double literals
• integer literals
• string literals that are not input parameters for functions: db2-fn:sqlquery, db2-fn:xmlcolumn,
db2-fn:xmlcolumn-contains
The following XQuery statements match if inexact matching is enabled:
For inexact matching, the special register is not supported. The following examples show some of the
type of statements that do not match in inexact matching:
• c1 in (c1, 1, 2)
c1 in (c2, 1, 2)
• A = 5
A = 5 + :hv
• with RR
with RS
<STMTPROFILE ID='S1'>
<STMTMATCH EXACT='TRUE'/>
<STMTKEY>
<![CDATA[select t1.c1, count(*) from t1,t2 where t1.c1 = t2.c1 and t1.c1 >
0]]>
</STMTKEY>
<OPTGUIDELINES>
<NLJOIN>
<TBSCAN TABLE='T1'/>
<TBSCAN TABLE='T2'/>
</NLJOIN>
</OPTGUIDELINES>
</STMTPROFILE>
<STMTPROFILE ID='S2'>
<STMTKEY><![CDATA[select * from T1 where c1 in( 10,20)]]>
</STMTKEY>
<OPTGUIDELINES>
<REGISTRY>
<OPTION NAME='DB2_REDUCED_OPTIMIZATION' VALUE='YES'/>
</REGISTRY>
</OPTGUIDELINES>
</STMTPROFILE>
</OPTPROFILE>
Order of precedence
In the example the STMTMATCH element has been set at both the global and statement level. Therefore,
to determine which matching method gets executed depends on the order of precedence. The following is
the order of precedence from highest to lowest:
1. Statement level profile settings
c1 = ?
c1 = :hv1
c1 in (:hv2, :hv3);
c1 in ( ?, ?, ?, ?);
c1 in (c1, c2 );
The following statement fragment has different literals in the select list, but still matches:
select 1, c1 from t1
select 2, c1 from t1
The following statement fragment has a different subquery in the select list, but still matches:
The following statement fragment has a different expression in the select list, but still matches:
The following statement fragment has different rows for the fetch clause, but still matches:
The following statement fragment has different literal value for the having clause, but still matches:
having c1 > 0
having c1 > 10
Each of the following pairs of statement fragments either have different column positioning for the
order by clause or have different literal values for the expression in the order by clause, but still
match:
order by c1+1, c2 + 2, 4
order by c1+2, c2 + 3, 4
Each of the following pairs of statement fragments either have different literal values or host variables
for the set clause, but still match:
set c1 = 1
set c1 = 2
set queryno = 2
set queryno = 3
Each of the following pairs of statement fragments have different literal values for the group by
clause, but still match:
group by c1 + 1
group by c1 + 2
group by 1,2,3
group by 3,2,1
Each of the following pairs of statement fragments have different literal values for the values clause,
but still match:
values 1,2,3
values 3,4,5
decimal(c1, 5, 2)
decimal(c1, 9, 3)
Blob('%abc%')
Blob('cde%')
order by mod(c1, 2)
order by mod(c1, 4)
XML Schema
Description
The optGuidelinesType group defines the set of valid sub-elements of the OPTGUIDELINES element.
Each sub-element is interpreted as an optimization guideline by the Db2 optimizer. Sub-elements can be
categorized as either general request elements, rewrite request elements, access request elements, or
join request elements.
• General request elements are used to specify general optimization guidelines, which can be used to
change the optimizer's search space.
• Rewrite request elements are used to specify query rewrite optimization guidelines, which can be used
to affect the query transformations that are applied when the optimized statement is being determined.
• Access request elements and join request elements are plan optimization guidelines, which can be used
to affect access methods, join methods, and join orders that are used in the execution plan for the
optimized statement.
• MQT enforcement request elements specify semantically matchable materialized query tables (MQTs)
whose use in access plans should be enforced regardless of cost estimates.
Note: Optimization guidelines that are specified in a statement profile take precedence over those that
are specified in the global section of an optimization profile.
General optimization guidelines can be specified at both the global and statement levels. The description
and syntax of general optimization guideline elements is the same for both global optimization guidelines
and statement-level optimization guidelines.
Description
General request elements define general optimization guidelines, which affect the optimization search
space. Affecting the optimization search space can affect the applicability of rewrite and cost-based
optimization guidelines.
XML Schema
<xs:simpleType name="intStringType">
<xs:union>
<xs:simpleType>
<xs:restriction base="xs:integer">
<xs:minInclusive value="1"></xs:minInclusive>
<xs:maxInclusive value="32767"></xs:maxInclusive>
</xs:restriction>
</xs:simpleType>
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="ANY"/>
<xs:enumeration value="-1"/>
</xs:restriction>
</xs:simpleType>
</xs:union>
</xs:simpleType>
<xs:complexType name="degreeType">
<xs:attribute name="VALUE"
type="intStringType"></xs:attribute>
</xs:complexType>
Description
The DEGREE general request element has a required VALUE attribute that specifies the setting of the
DEGREE option. The attribute can take an integer value from 1 to 32 767 or the string value -1 or ANY.
The value -1 (or ANY) specifies that the degree of parallelism is to be determined by the data server. A
value of 1 specifies that the query should not use intrapartition parallelism.
DPFXMLMOVEMENT requests
The DPFXMLMOVEMENT general request element can be used in partitioned database environments to
override the optimizer's decision to choose a plan in which either a column of type XML is moved or only a
reference to that column is moved between database partitions. It is defined by the complex type
dpfXMLMovementType.
<xs:complexType name="dpfXMLMovementType">
<xs:attribute name="VALUE" use="required">
<xs:simpleType>
<xs:restriction base="xs:string"
<xs:enumeration value="REFERENCE"/>
<xs:enumeration value="COMBINATION"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
Description
In partitioned database environments, data must sometimes be moved between database partitions
during statement execution. In the case of XML columns, the optimizer can choose to move the actual
documents that are contained in those columns or merely a reference to the source documents on the
original database partitions.
The DPFXMLMOVEMENT general request element has a required VALUE attribute with the following
possible values: REFERENCE or COMBINATION. If a row that contains an XML column needs to be moved
from one database partition to another:
QRYOPT requests
The QRYOPT general request element can be used to override the setting of the QUERYOPT bind
parameter, the value of the dft_queryopt database configuration parameter, or the result of a previous
SET CURRENT QUERY OPTIMIZATION statement. It is defined by the complex type qryoptType.
XML Schema
<xs:complexType name="qryoptType">
<xs:attribute name="VALUE" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="0"/>
<xs:enumeration value="1"/>
<xs:enumeration value="2"/>
<xs:enumeration value="3"/>
<xs:enumeration value="5"/>
<xs:enumeration value="7"/>
<xs:enumeration value="9"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
Description
The QRYOPT general request element has a required VALUE attribute that specifies the setting of the
QUERYOPT option. The attribute can take any of the following values: 0, 1, 2, 3, 5, 7, or 9. For detailed
information about what these values represent, see "Optimization classes".
REOPT requests
The REOPT general request element can be used to override the setting of the REOPT bind parameter,
which affects the optimization of statements that contain parameter markers or host variables. It is
defined by the complex type reoptType.
XML Schema
<xs:complexType name="reoptType">
<xs:attribute name="VALUE" use="required">
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:enumeration value="ONCE"/>
<xs:enumeration value="ALWAYS"/>
</xs:restriction>
</xs:simpleType>
</xs:attribute>
</xs:complexType>
Description
The REOPT general request element has a required VALUE attribute that specifies the setting of the
REOPT option. The attribute can take the value ONCE or ALWAYS. ONCE specifies that the statement
should be optimized for the first set of host variable or parameter marker values. ALWAYS specifies that
the statement should be optimized for each set of host variable or parameter marker values.
XML Schema
<xs:complexType name="registryType">
<xs:sequence>
<xs:element name="OPTION" type="genericOptionType"
minOccurs="1" maxOccurs="unbounded"/>
</xs:sequence>
</xs:complexType>
<xs:complexType name="genericOptionType">
<xs:attribute name="NAME" type="xs:string" use="required"/>
<xs:attribute name="VALUE" type="xs:string" use="required"/>
</xs:complexType>
Description
The REGISTRY element sets registry variables at the global level, statement level, or both. The OPTION
element which is embedded in the REGISTRY element has a NAME and VALUE attribute. These attributes
specify the registry variable name and value that are applied to the profile or to specific statements in the
profile.
RTS requests
The RTS general request element can be used to enable or disable real-time statistics collection. It can
also be used to limit the amount of time taken by real-time statistics collection.
For certain queries or workloads, it might be good practice to limit real-time statistics collection so that
extra overhead at statement compilation time can be avoided. The RTS general request element is
defined by the complex type rtsType.
<!--******************************************************************************************--> \
<!-- RTS general request element to enable, disable or provide a time budget for --> \
<!-- real-time statistics collection. --> \
<!-- OPTION attribute allows enabling or disabling real-time statistics. --> \
<!-- TIME attribute provides a time budget in milliseconds for real-time statistics collection.--> \
<!--***************************************************************************************** --> \
<xs:complexType name="rtsType">
<xs:attribute name="OPTION" type="optionType" use="optional" default="ENABLE"/>
<xs:attribute name="TIME" type="xs:nonNegativeInteger" use="optional"/>
</xs:complexType>
Description
The RTS general request element has two optional attributes.
• The OPTION attribute is used to enable or disable real-time statistics collection. It can take the value
ENABLE (default) or DISABLE.
• The TIME attribute specifies the maximum amount of time (in milliseconds) that can be spent (per
statement) on real-time statistics collection at statement compilation time.
If ENABLE is specified for the OPTION attribute, automatic statistics collection and real-time statistics
must be enabled through their corresponding configuration parameters. Otherwise, the optimization
guideline will not be applied, and SQL0437W with reason code 13 is returned.
XML Schema
<xs:group name="rewriteRequest">
<xs:sequence>
Description
If the INLIST2JOIN element is used to specify both statement-level and predicate-level optimization
guidelines, the predicate-level guidelines override the statement-level guidelines.
XML Schema
<xs:complexType name="inListToJoinType">
<xs:attribute name="OPTION" type="optionType" use="optional" default="ENABLE"/>
<xs:attribute name="TABLE" type="xs:string" use="optional"/>
<xs:attribute name="COLUMN" type="xs:string" use="optional"/>
</xs:complexType>
Description
The INLIST2JOIN query rewrite request element has three optional attributes and no sub-elements. The
OPTION attribute can take the value ENABLE (default) or DISABLE. The TABLE and COLUMN attributes
are used to specify an IN-LIST predicate. If these attributes are not specified, or are specified with an
empty string ("") value, the guideline is handled as a statement-level guideline. If one or both of these
attributes are specified, it is handled as a predicate-level guideline. If the TABLE attribute is not specified,
or is specified with an empty string value, but the COLUMN attribute is specified, the optimization
guideline is ignored and SQL0437W with reason code 13 is returned.
XML Schema
<xs:complexType name="notExistsToAntiJoinType">
<xs:attribute name="OPTION" type="optionType" use="optional" default="ENABLE"/>
</xs:complexType>
Description
The NOTEX2AJ query rewrite request element has one optional attribute and no sub-elements. The
OPTION attribute can take the value ENABLE (default) or DISABLE.
XML Schema
<xs:complexType name="notInToAntiJoinType">
<xs:attribute name="OPTION" type="optionType" use="optional" default="ENABLE"/>
</xs:complexType>
XML Schema
<xs:complexType name="subqueryToJoinType">
<xs:attribute name="OPTION" type="optionType" use="optional" default="ENABLE"/>
</xs:complexType>
Description
The SUBQ2JOIN query rewrite request element has one optional attribute and no sub-elements. The
OPTION attribute can take the value ENABLE (default) or DISABLE.
Access requests
The accessRequest group defines the set of valid access request elements. An access request specifies
an access method for a table reference.
XML Schema
<xs:group name="accessRequest">
<xs:choice>
<xs:element name="TBSCAN" type="tableScanType"/>
<xs:element name="IXSCAN" type="indexScanType"/>
<xs:element name="LPREFETCH" type="listPrefetchType"/>
<xs:element name="IXAND" type="indexAndingType"/>
<xs:element name="IXOR" type="indexOringType"/>
<xs:element name="XISCAN" type="indexScanType"/>
<xs:element name="XANDOR" type="XANDORType"/>
<xs:element name="ACCESS" type="anyAccessType"/>
</xs:choice>
</xs:group>
Description
• TBSCAN, IXSCAN, LPREFETCH, IXAND, IXOR, XISCAN, and XANDOR
These elements correspond to Db2 data access methods, and can only be applied to local tables that
are referenced in a statement. They cannot refer to nicknames (remote tables) or derived tables (the
result of a subselect).
• ACCESS
This element, which causes the optimizer to choose the access method, can be used when the join
order (not the access method) is of primary concern. The ACCESS element must be used when the
target table reference is a derived table. For XML queries, this element can also be used with attribute
TYPE = XMLINDEX to specify that the optimizer is to choose XML index access plans.
XML Schema
<xs:complexType name="extendedAccessType">
<xs:complexContent>
<xs:extension base="accessType">
<xs:sequence minOccurs="0">
<xs:element name="INDEX" type="indexType" minOccurs="2"
maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="INDEX" type="xs:string" use="optional"/>
<xs:attribute name="TYPE" type="xs:string" use="optional"
fixed="XMLINDEX"/>
<xs:attribute name="ALLINDEXES" type="boolType" use="optional"
fixed="TRUE"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Description
All access request elements extend the complex type accessType. Each such element must specify the
target table reference using either the TABLE or TABID attribute. For information on how to form proper
table references from an access request element, see "Forming table references in optimization
guidelines".
Access request elements can also specify an optional FIRST attribute. If the FIRST attribute is specified,
it must have the value TRUE. Adding the FIRST attribute to an access request element indicates that the
execution plan should include the specified table as the first table in the join sequence of the
corresponding FROM clause. Only one access or join request per FROM clause can specify the FIRST
attribute. If multiple access or join requests targeting tables of the same FROM clause specify the FIRST
attribute, all but the first such request is ignored and a warning (SQL0437W with reason code 13) is
returned.
New optimizer guidelines enable you to influence the compiler's scan sharing decisions. In cases where
the compiler would have allowed sharing scans, wrapping scans, or throttling, specifying the appropriate
guideline will prevent sharing scans, wrapping scans, or throttling. A sharing scan can be seen by other
scans that are participating in scan sharing, and those scans can base certain decisions on that
information. A wrapping scan is able to start at an arbitrary point in the table to take advantage of pages
that are already in the buffer pool. A throttled scan has been delayed to increase the overall level of
sharing.
Valid optionType values (for the SHARING, WRAPPING, and THROTTLE attributes) are DISABLE and
ENABLE (the default). SHARING and WRAPPING cannot be enabled when the compiler chooses to disable
them. Using ENABLE will have no effect in those cases. THROTTLE can be either enabled or disabled.
Valid SHARESPEED values (to override the compiler's estimate of scan speed) are FAST and SLOW. The
default is to allow the compiler to determine values, based on its estimate.
The only supported value for the TYPE attribute is XMLINDEX, which indicates to the optimizer that the
table must be accessed using one of the XML index access methods, such as IXAND, IXOR, XANDOR, or
XISCAN. If this attribute is not specified, the optimizer makes a cost-based decision when selecting an
access plan for the specified table.
XML Schema
<xs:complexType name="anyAccessType">
<xs:complexContent>
<xs:extension base="extendedAccessType"/>
</xs:complexContent>
</xs:complexType>
Description
The complex type anyAccessType is a simple extension of the abstract type extendedAccessType. No
new elements or attributes are added.
The TYPE attribute, whose only supported value is XMLINDEX, indicates to the optimizer that the table
must be accessed using one of the XML index access methods, such as IXAND, IXOR, XANDOR, or
XISCAN. If this attribute is not specified, the optimizer makes a cost-based decision when selecting an
access plan for the specified table.
The optional INDEX attribute can be used to specify an index name only if the TYPE attribute has a value
of XMLINDEX. If this attribute is specified, the optimizer might choose one of the following plans:
• An XISCAN plan using the specified index over XML data
• An XANDOR plan, such that the specified index over XML data is one of the indexes under XANDOR; the
optimizer will use all applicable indexes over XML data in the XANDOR plan
• An IXAND plan, such that the specified index is the leading index of IXAND; the optimizer will add more
indexes to the IXAND plan in a cost-based fashion
• A cost-based IXOR plan
The optional INDEX element can be used to specify two or more names of indexes as index elements only
if the TYPE attribute has a value of XMLINDEX. If this element is specified, the optimizer might choose
one of the following plans:
• An XANDOR plan, such that the specified indexes over XML data appear under XANDOR; the optimizer
will use all applicable indexes over XML data in the XANDOR plan
• An IXAND plan, such that the specified indexes are the indexes of IXAND, in the specified order
• A cost-based IXOR plan
If the INDEX attribute and the INDEX element are both specified, the INDEX attribute is ignored.
The optional ALLINDEXES attribute, whose only supported value is TRUE, can only be specified if the
TYPE attribute has a value of XMLINDEX. If this attribute is specified, the optimizer must use all
applicable relational indexes and indexes over XML data to access the specified table, regardless of cost.
The optimizer chooses one of the following plans:
• An XANDOR plan with all applicable indexes over XML data appearing under the XANDOR operator
• An IXAND plan with all applicable relational indexes and indexes over XML data appearing under the
IXAND operator
Examples
The following guideline is an example of an any access request:
<OPTGUIDELINES>
<HSJOIN>
<ACCESS TABLE='S1'/>
<IXSCAN TABLE='PS1'/>
</HSJOIN>
</OPTGUIDELINES>
The following example shows an ACCESS guideline specifying that some XML index access to the
SECURITY table should be used. The optimizer might pick any XML index plan, such as an XISCAN,
IXAND, XANDOR, or IXOR plan.
<OPTGUIDELINES>
<ACCESS TABLE='SECURITY' TYPE='XMLINDEX'/>
</OPTGUIDELINES>
The following example shows an ACCESS guideline specifying that all possible index access to the
SECURITY table should be used. The choice of method is left to the optimizer. Assume that two XML
indexes, SEC_INDUSTRY and SEC_SYMBOL, match the two XML predicates. The optimizer chooses either
the XANDOR or the IXAND access method using a cost-based decision.
<OPTGUIDELINES>
<ACCESS TABLE='SECURITY' TYPE='XMLINDEX' ALLINDEXES='TRUE'/>
</OPTGUIDELINES>
The following example shows an ACCESS guideline specifying that the SECURITY table should be
accessed using at least the SEC_INDUSTRY XML index. The optimizer chooses one of the following access
plans in a cost-based fashion:
• An XISCAN plan using the SEC_INDUSTRY XML index
• An IXAND plan with the SEC_INDUSTRY index as the first leg of the IXAND. The optimizer is free to use
more relational or XML indexes in the IXAND plan following cost-based analysis. If a relational index
were available on the TRANS_DATE column, for example, that index might appear as an additional leg
of the IXAND if that were deemed to be beneficial by the optimizer.
• A XANDOR plan using the SEC_INDUSTRY index and other applicable XML indexes
<OPTGUIDELINES>
<ACCESS TABLE='SECURITY' TYPE='XMLINDEX' INDEX='SEC_INDUSTRY'/>
</OPTGUIDELINES>
<xs:complexType name="indexAndingType">
<xs:complexContent>
<xs:extension base="extendedAccessType">
<xs:sequence minOccurs="0">
<xs:element name="NLJOIN" type="nestedLoopJoinType" minOccurs="1"
maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="STARJOIN" type="boolType" use="optional"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Description
The complex type indexAndingType is an extension of extendedAccessType. When the STARJOIN
attribute and NLJOIN elements are not specified, indexAndingType becomes a simple extension of
extendedAccessType. The extendedAccessType type extends the abstract type accessType by adding an
optional INDEX attribute, optional INDEX sub-elements, an optional TYPE attribute, and an optional
ALLINDEXES attribute. The INDEX attribute can be used to specify the first index that is to be used in the
index ANDing operation. If the INDEX attribute is used, the optimizer chooses additional indexes and the
access sequence in a cost-based fashion. The INDEX sub-elements can be used to specify the exact set
of indexes and access sequence. The order in which the INDEX sub-elements appear indicates the order
in which the individual index scans should be performed. The specification of INDEX sub-elements
supersedes the specification of the INDEX attribute.
• If no indexes are specified, the optimizer chooses both the indexes and the access sequence in a cost-
based fashion.
• If indexes are specified using either the attribute or sub-elements, these indexes must be defined on
the table that is identified by the TABLE or TABID attribute.
• If there are no indexes defined on the table, the access request is ignored and an error is returned.
The TYPE attribute, whose only supported value is XMLINDEX, indicates to the optimizer that the table
must be accessed using one or more indexes over XML data.
The optional INDEX attribute can be used to specify an XML index name only if the TYPE attribute has a
value of XMLINDEX. A relational index can be specified in the optional INDEX attribute regardless of the
TYPE attribute specification. The specified index is used by the optimizer as the leading index of an
IXAND plan. The optimizer will add more indexes to the IXAND plan in a cost-based fashion.
The optional INDEX element can be used to specify two or more names of indexes over XML data as index
elements only if the TYPE attribute has a value of XMLINDEX. Relational indexes can be specified in the
optional INDEX elements regardless of the TYPE attribute specification. The specified indexes are used by
the optimizer as the indexes of an IXAND plan in the specified order.
If the TYPE attribute is not present, INDEX attributes and INDEX elements are still valid for relational
indexes.
If the INDEX attribute and the INDEX element are both specified, the INDEX attribute is ignored.
The optional ALLINDEXES attribute, whose only supported value is TRUE, can only be specified if the
TYPE attribute has a value of XMLINDEX. If this attribute is specified, the optimizer must use all
applicable relational indexes and indexes over XML data in an IXAND plan to access the specified table,
regardless of cost.
If the TYPE attribute is specified, but neither INDEX attribute, INDEX element, nor ALLINDEXES attribute
is specified, the optimizer will choose an IXAND plan with at least one index over XML data. Other indexes
in the plan can be either relational indexes or indexes over XML data. The order and choice of indexes is
determined by the optimizer in a cost-based fashion.
Block indexes must appear before record indexes in an index ANDing access request. If this requirement
is not met, an error is returned. The index ANDing access method requires that at least one predicate is
able to be indexed for each index. If index ANDing is not eligible because the required predicate does not
SQL statement:
Optimization guideline:
<OPTGUIDELINES>
<IXAND TABLE='"Samp".PARTS' FIRST='TRUE'>
<INDEX IXNAME='ISIZE'/>
<INDEX IXNAME='ITYPE'/>
</IXAND>
</OPTGUIDELINES>
The index ANDing request specifies that the PARTS table in the main subselect is to be satisfied using an
index ANDing data access method. The first index scan will use the ISIZE index, and the second index
scan will use the ITYPE index. The indexes are specified by the IXNAME attribute of the INDEX element.
The FIRST attribute setting specifies that the PARTS table is to be the first table in the join sequence with
the SUPPLIERS, PARTSUPP, and derived tables in the same FROM clause.
The following example illustrates a star join index ANDing guideline that specifies the first semi-join but
lets the optimizer choose the remaining ones. It also lets the optimizer choose the specific access
method for the outer table and the index for the inner table in the specified semi-join.
<IXAND TABLE="F">
<NLJOIN>
<ACCESS TABLE="D1"/>
<IXSCAN TABLE="F"/>
</NLJOIN>
</IXAND>
The following guideline specifies all of the semi-joins, including details, leaving the optimizer with no
choices for the plan at and after the IXAND.
XML Schema
<xs:complexType name="indexOringType">
<xs:complexContent>
<xs:extension base="accessType"/>
</xs:complexContent>
</xs:complexType>
Description
The complex type indexOringType is a simple extension of the abstract type accessType. No new
elements or attributes are added. If the index ORing access method is not in the search space that is in
effect for the statement, the access request is ignored and SQL0437W with reason code 13 is returned.
The optimizer chooses the predicates and indexes that are used in the index ORing operation in a cost-
based fashion. The index ORing access method requires that at least one IN predicate is able to be
indexed or that a predicate with terms is able to be indexed and connected by a logical OR operation. If
index ORing is not eligible because the required predicate or indexes do not exist, the request is ignored
and SQL0437W with reason code 13 is returned.
The following example illustrates an index ORing access request:
SQL statement:
Optimization guideline:
<OPTGUIDELINES>
<IXOR TABLE='S'/>
</OPTGUIDELINES>
This index ORing access request specifies that an index ORing data access method is to be used to access
the SUPPLIERS table that is referenced in the main subselect. The optimizer will choose the appropriate
predicates and indexes for the index ORing operation in a cost-based fashion.
XML Schema
<xs:complexType name="indexScanType">
<xs:complexContent>
<xs:extension base="accessType">
<xs:attribute name="INDEX" type="xs:string" use="optional"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Description
The complex type indexScanType extends the abstract accessType by adding an optional INDEX
attribute. The INDEX attribute specifies the unqualified name of the index that is to be used to access the
table.
• If the index scan access method is not in the search space that is in effect for the statement, the access
request is ignored and SQL0437W with reason code 13 is returned.
• If the INDEX attribute is specified, it must identify an index defined on the table that is identified by the
TABLE or TABID attribute. If the index does not exist, the access request is ignored and SQL0437W with
reason code 13 is returned.
• If the INDEX attribute is not specified, the optimizer chooses an index in a cost-based fashion. If no
indexes are defined on the target table, the access request is ignored and SQL0437W with reason code
13 is returned.
The following guideline is an example of an index scan access request:
<OPTGUIDELINES>
<IXSCAN TABLE='S' INDEX='I_SUPPKEY'/>
</OPTGUIDELINES>
XML Schema
<xs:complexType name="listPrefetchType">
<xs:complexContent>
<xs:extension base="accessType">
<xs:attribute name="INDEX" type="xs:string" use="optional"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Description
The complex type listPrefetchType extends the abstract type accessType by adding an optional INDEX
attribute. The INDEX attribute specifies the name of the index that is to be used to access the table.
• If the list prefetch access method is not in the search space that is in effect for the statement, the
access request is ignored and SQL0437W with reason code 13 is returned.
• The list prefetch access method requires that at least one predicate is able to be indexed. If the list
prefetch access method is not eligible because the required predicate does not exist, the access
request is ignored and SQL0437W with reason code 13 is returned.
• If the INDEX attribute is specified, it must identify an index defined on the table that is specified by the
TABLE or TABID attribute. If the index does not exist, the access request is ignored and SQL0437W with
reason code 13 is returned.
<OPTGUIDELINES>
<LPREFETCH TABLE='S1' INDEX='I_SNATION'/>
</OPTGUIDELINES>
XML Schema
<xs:complexType name="tableScanType">
<xs:complexContent>
<xs:extension base="accessType"/>
</xs:complexContent>
</xs:complexType>
Description
The complex type tableScanType is a simple extension of the abstract type accessType. No new elements
or attributes are added. If the table scan access method is not in the search space that is in effect for the
statement, the access request is ignored and SQL0437W with reason code 13 is returned.
The following guideline is an example of a table scan access request:
<OPTGUIDELINES>
<TBSCAN TABLE='S1'/>
</OPTGUIDELINES>
XML Schema
<xs:complexType name="XANDORType">
<xs:complexContent>
<xs:extension base="accessType"/>
</xs:complexContent>
</xs:complexType>
Description
The complex type XANDORType is a simple extension of the abstract type accessType. No new elements
or attributes are added.
Example
Given the following query:
The following XANDOR guideline specifies that the SECURITY table should be accessed using a XANDOR
operation against all applicable XML indexes. Any relational indexes on the SECURITY table will not be
considered, because a relational index cannot be used with a XANDOR operator.
XML Schema
<xs:complexType name="indexScanType">
<xs:complexContent>
<xs:extension base="accessType"/>
<xs:attribute name="INDEX" type="xs:string" use="optional"/>
</xs:extension>
</xs:complexContent>
</xs:complexType>
Description
The complex type indexScanType extends the abstract accessType by adding an optional INDEX
attribute. The INDEX attribute specifies the name of the index over XML data that is to be used to access
the table.
• If the index over XML data scan access method is not in the search space that is in effect for the
statement, the access request is ignored and SQL0437W with reason code 13 is returned.
• If the INDEX attribute is specified, it must identify an index over XML data defined on the table that is
identified by the TABLE or TABID attribute. If the index does not exist, the access request is ignored and
SQL0437W with reason code 13 is returned.
• If the INDEX attribute is not specified, the optimizer chooses an index over XML data in a cost-based
fashion. If no indexes over XML data are defined on the target table, the access request is ignored and
SQL0437W with reason code 13 is returned.
Example
Given the following query:
The following XISCAN guideline specifies that the SECURITY table should be accessed using an XML
index named SEC_INDUSTRY.
<OPTGUIDELINES>
<XISCAN TABLE='SECURITY' INDEX='SEC_INDUSTRY'/>
</OPTGUIDELINES>
Join requests
The joinRequest group defines the set of valid join request elements. A join request specifies a method for
joining two tables.
XML Schema
<xs:group name="joinRequest">
<xs:choice>
<xs:element name="NLJOIN" type="nestedLoopJoinType"/>
<xs:element name="HSJOIN" type="hashJoinType"/>
<xs:element name="MSJOIN" type="mergeJoinType"/>
<xs:element name="JOIN" type="anyJoinType"/>
</xs:choice>
</xs:group>
<OPTGUIDELINES>
<HSJOIN>
<ACCESS TABLE='S1'/>
<IXSCAN TABLE='PS1'/>
</HSJOIN>
</OPTGUIDELINES>
The nesting order ultimately determines the join order. The following example illustrates how larger join
requests can be constructed from smaller join requests:
<OPTGUIDELINES>
<MSJOIN>
<NLJOIN>
<IXSCAN TABLE='"Samp".Parts'/>
<IXSCAN TABLE="PS"/>
</NLJOIN>
<IXSCAN TABLE='S'/>
</MSJOIN>
</OPTGUIDELINES>
Join types
Common aspects of all join request elements are defined by the abstract type joinType.
XML Schema
Description
Join request elements that extend the complex type joinType must have exactly two sub-elements. Either
sub-element can be an access request element chosen from the accessRequest group, or another join
request element chosen from the joinRequest group. The first sub-element appearing in the join request
specifies the outer table of the join operation, and the second element specifies the inner table.
If the FIRST attribute is specified, it must have the value TRUE. Adding the FIRST attribute to a join
request element indicates that you want an execution plan in which the tables that are targeted by the
join request are the outermost tables in the join sequence for the corresponding FROM clause. Only one
access or join request per FROM clause can specify the FIRST attribute. If multiple access or join requests
that target tables of the same FROM clause specify the FIRST attribute, all but the initial request are
ignored and SQL0437W with reason code 13 is returned.
XML Schema
<xs:complexType name="anyJoinType">
<xs:complexContent>
<xs:extension base="joinType"/>
</xs:complexContent>
</xs:complexType>
Description
The complex type anyJoinType is a simple extension of the abstract type joinType. No new elements or
attributes are added.
The following example illustrates the use of the JOIN join request element to force a particular join order
for a set of tables:
SQL statement:
Optimization guideline:
<OPTGUIDELINES>
<JOIN>
<JOIN>
<ACCESS TABLE='Samp".PARTS'/>
<ACCESS TABLE='S'/>
</JOIN>
<ACCESS TABLE='PS'>
</JOIN>
</OPTGUIDELINES>
The JOIN join request elements specify that the PARTS table in the main subselect is to be joined with the
SUPPLIERS table, and that this result is to be joined to the PARTSUPP table. The optimizer will choose the
join methods for this particular sequence of joins in a cost-based fashion.
XML Schema
<xs:complexType name="hashJoinType">
<xs:complexContent>
<xs:extension base="joinType"/>
</xs:complexContent>
</xs:complexType>
<OPTGUIDELINES>
<HSJOIN>
<ACCESS TABLE='S1'/>
<IXSCAN TABLE='PS1'/>
</HSJOIN>
</OPTGUIDELINES>
XML Schema
<xs:complexType name="mergeJoinType">
<xs:complexContent>
<xs:extension base="joinType"/>
</xs:complexContent>
</xs:complexType>
Description
The complex type mergeJoinType is a simple extension of the abstract type joinType. No new elements or
attributes are added. If the merge join method is not in the search space that is in effect for the
statement, the join request is ignored and SQL0437W with reason code 13 is returned.
The following guideline is an example of a merge join request:
<OPTGUIDELINES>
<MSJOIN>
<NLJOIN>
<IXSCAN TABLE='"Samp".Parts'/>
<IXSCAN TABLE="PS"/>
</NLJOIN>
<IXSCAN TABLE='S'/>
</MSJOIN>
</OPTGUIDELINES>
XML Schema
<xs:complexType name="nestedLoopJoinType">
<xs:complexContent>
<xs:extension base="joinType"/>
</xs:complexContent>
</xs:complexType>
<OPTGUIDELINES>
<NLJOIN>
<IXSCAN TABLE='"Samp".Parts'/>
<IXSCAN TABLE="PS"/>
</NLJOIN>
</OPTGUIDELINES>
SYSTOOLS.OPT_PROFILE table
The SYSTOOLS.OPT_PROFILE table contains all of the optimization profiles.
There are two methods to create this table:
• Call the SYSINSTALLOBJECTS procedure:
"ROBERT","PROF1","ROBERT.PROF1.xml"
"ROBERT","PROF2","ROBERT.PROF2.xml"
"DAVID", "PROF1","DAVID.PROF1.xml"
To update existing rows, use the INSERT_UPDATE option on the IMPORT command:
To copy the ROBERT.PROF1 profile into ROBERT.PROF1.xml, assuming that the profile is less than 32
700 bytes long, use the EXPORT command:
For more information, including how to export more than 32 700 bytes of data, see "EXPORT command".
Distribution statistics make the optimizer aware of data skew. Detailed index statistics provide more
details about the I/O required to fetch data pages when the table is accessed by using a particular index.
Collecting detailed index statistics uses considerable processing time and memory for large tables. The
SAMPLED option provides detailed index statistics with nearly the same accuracy but requires a fraction of
the CPU and memory. These options are used by automatic statistics collection when a statistical profile
is not provided for a table.
To improve query performance, consider collecting more advanced statistics, such as column group
statistics or LIKE statistics, or creating statistical views.
Statistical views are helpful when gathering statistics for complex relationships. Gathering statistics for
statistical views can be automated through the automatic statistics collection feature in Db2. Enabling or
disabling the automatic statistic collection of statistical views is done by using the auto_stats_views
database configuration parameter. To enable this function, issue the following command:
This database configuration parameter is off by default. The command that is issued to automatically
collect statistics on statistical views is equivalent to the following command:
Collecting statistics for a large table or statistical view can be time consuming. Statistics of the same
quality can often be collected by considering just a small sample of the overall data. Consider enabling
automatic sampling for all background statistic collections; this may reduce the statistic collection time.
To enable this function, issue the following command:
Collected statistics are not always exact. In addition to providing more efficient data access, an index can
help provide more accurate statistics for columns which are often used in query predicates. When
statistics are collected for a table and its indexes, index objects can provide accurate statistics for the
leading index columns.
The automatic collection of column group statistics will generate a profile describing the statistics that
need to be collected. If a user profile does not exist, the background statistics collection will initially
perform an automatic discovery of pair-wise column group statistics within the table and set a statistics
profile. After the discovery is completed, statistics are gathered on the table using the existing statistics
profile feature. The set of column groups discovered is preserved across subsequent discoveries.
If a statistics profile is already manually set, it will be used as is and the discovery is not performed. The
automatically generated statistics profile can be used together with any PROFILE option of the
RUNSTATS command. If the profile is updated using the UPDATE PROFILE option, any further discovery
is blocked on the table, but the set of column group statistics already set in the profile will continue to be
collected automatically as well as with a manual RUNSTATS that includes the USE PROFILE option.
The UNSET PROFILE command can be used to remove the statistics profile to restart the discovery
process.
To disable this feature, issue the following command:
Disabling this feature will prevent any further discovery, but the statistic profiles will persist and will
continue to be used. If there is a need to remove the profile, use the UNSET PROFILE option of
RUNSTATS.
This query returns the names and raw material quality of all products. There are two join predicates:
PRODUCT.COLOR = RAWMATERIAL.COLOR
PRODUCT.ELASTICITY = RAWMATERIAL.ELASTICITY
The optimizer assumes that the two predicates are independent, which means that all variations of
elasticity occur for each color. It then estimates the overall selectivity of the pair of predicates by using
catalog statistics information for each table based on the number of levels of elasticity and the number of
different colors. Based on this estimate, it might choose, for example, a nested loop join in preference to a
merge join, or the reverse.
However, these two predicates might not be independent. For example, highly elastic materials might be
available in only a few colors, and the very inelastic materials might be available in a few other colors that
are different from the elastic ones. In that case, the combined selectivity of the predicates eliminates
fewer rows and the query returns more rows. Without this information, the optimizer might no longer
choose the best plan.
To collect the column group statistics on PRODUCT.COLOR and PRODUCT.ELASTICITY, issue the
following RUNSTATS command:
Without any index or column group statistics, the optimizer estimates the number of groupings (and, in
this case, the number of rows returned) as the product of the number of distinct values in DEPTNO, MGR,
and YEAR_HIRED. This estimate assumes that the grouping key columns are independent. However, this
assumption could be incorrect if each manager manages exactly one department. Moreover, it is unlikely
that each department hires employees every year. Thus, the product of distinct values of DEPTNO, MGR,
and YEAR_HIRED could be an overestimate of the actual number of distinct groups.
Column group statistics collected on DEPTNO, MGR, and YEAR_HIRED provide the optimizer with the
exact number of distinct groupings for the previous query:
In addition to JOIN predicate correlation, the optimizer manages correlation with simple equality
predicates, such as:
In this example, predicates on the DEPTNO column in the EMPLOYEE table are likely to be independent of
predicates on the YEAR column. However, the predicates on DEPTNO and MGR are not independent,
because each department would usually be managed by one manager at a time. The optimizer uses
statistical information about columns to determine the combined number of distinct values and then
adjusts the cardinality estimate to account for correlation between columns.
Column group statistics can also be used on statistical views. The column group statistics help adjust the
skewed statistics in the statistical view when there is more than one strong correlation in the queries. The
optimizer can use these statistics to obtain better cardinality estimates which might result in better
access plans.
Statistical views
The Db2 cost-based optimizer uses an estimate of the number of rows processed by an access plan
operator to accurately cost that operator. This cardinality estimate is the single most important input to
Once the view is enabled for optimization, it is identified as a statistical view in the SYSCAT.TABLES
catalog view with a 'Y' in position 13 of the PROPERTY column.
This statistical view can be used to improve the cardinality estimate and, consequently, the access plan
and query performance for queries such as the following query:
SELECT SUM(S.PRICE)
FROM SALES S, TIME T, PRODUCT P
WHERE
T.TIME_KEY = S.TIME_KEY AND
T.YEAR_MON = 200712 AND
P.PROD_KEY = S.PROD_KEY AND
P.PROD_DESC = ‘Power drill'
Without a statistical view, the optimizer assumes that all fact table TIME_KEY values corresponding to a
particular TIME dimension YEAR_MON value occur uniformly within the fact table. However, sales might
have been strong in December, resulting in many more sales transactions than during other months.
Statistics that are gathered on queries that have complex expressions in the predicate can be used by the
optimizer to calculate accurate cardinality estimates which results in better access plans.
For many star join queries, several statistical views might need to be created. You can use referential
integrity constraints to reduce the number of views needed to obtain the required statistical information.
Another way to obtain better access plans is to apply column group statistics on statistical views. These
group statistics help to adjust filter factors which help to gather more accurate statistics which the
optimizer can use to obtain accurate cardinality estimates.
Procedure
1. Enable the view for optimization.
A view can be enabled for optimization using the ENABLE OPTIMIZATION clause on the ALTER VIEW
statement. A view that has been enabled for optimization can subsequently be disabled for
optimization using the DISABLE OPTIMIZATION clause. For example, to enable MYVIEW for
optimization, enter the following:
To use row-level sampling of 10 percent of the rows while collecting view statistics, including
distribution statistics, enter the following:
To use page-level sampling of 10 percent of the pages while collecting view statistics, including
distribution statistics, enter the following:
3. Optional: If queries that are impacted by the view definition are part of static SQL packages, rebind
those packages to take advantage of changes to access plans resulting from the new statistics.
A star join query execution plan can be an excellent choice for this query, provided that the optimizer can
determine whether the semi-join involving PRODUCT and DAILY_SALES, or the semi-join involving
Suppose the company managers want to determine whether or not consumers will buy a product again if
they are offered a discount on a return visit. Moreover, suppose this study is done only for store '01',
which has 18 locations nationwide. Table 71 on page 396 shows information about the different
categories of promotion that are available.
Table 71. PROMOTION (35 rows)
2 Coupon 15 42.86%
3 Advertisement 5 14.29%
The table indicates that discounts for return customers represents only 2.86% of the 35 kinds of
promotions that were offered.
The following query returns a count of 12 889 514:
select count(*)
from store d1, promotion d2, daily_sales f
where d1.storekey = f.storekey
and d2.promokey = f.promokey
and d1.store_number = '01'
and d2.promotype = 1
This query executes according to the following plan that is generated by the optimizer. In each node of
this diagram, the first row is the cardinality estimate, the second row is the operator type, and the third
row (the number in parentheses) is the operator ID.
6.15567e+06
IXAND
( 8)
/------------------+------------------\
2.15448e+07 2.15448e+08
NLJOIN NLJOIN
( 9) ( 13)
/---------+--------\ /---------+--------\
1 2.15448e+07 18 1.19694e+07
FETCH IXSCAN FETCH IXSCAN
( 10) ( 12) ( 14) ( 16)
/---+---\ | /---+---\ |
35 35 7.54069e+08 18 63 7.54069e+08
IXSCAN TABLE: DB2DBA INDEX: DB2DBA IXSCAN TABLE: DB2DBA INDEX: DB2DBA
( 11) PROMOTION PROMO_FK_IDX ( 15) STORE
STORE_FK_IDX
| |
At the nested loop join (number 9), the optimizer estimates that around 2.86% of the product sold
resulted from customers coming back to buy the same products at a discounted price (2.15448e+07 ÷
7.54069e+08 ≈ 0.0286). Note that this is the same value before and after joining the PROMOTION table
with the DAILY_SALES table. Table 72 on page 397 summarizes the cardinality estimates and their
percentage (the filtering effect) before and after the join.
Table 72. Cardinality estimates before and after joining with DAILY_SALES.
Before Join After Join
Predicate count percentage of rows count percentage of rows
qualified qualified
store_number = '01' 18 28.57% 2.15448e 28.57%
+08
promotype = 1 1 2.86% 2.15448e 2.86%
+07
Because the probability of promotype = 1 is less than that of store_number = '01', the optimizer
chooses the semi-join between PROMOTION and DAILY_SALES as the outer leg of the star join plan's
index ANDing operation. This leads to an estimated count of approximately 6 155 670 products sold
using promotion type 1 - an incorrect cardinality estimate that is off by a factor of 2.09 (12 889 514 ÷ 6
155 670 ≈ 2.09).
What causes the optimizer to only estimate half of the actual number of records satisfying the two
predicates? Store '01' represents about 28.57% of all the stores. What if other stores had more sales than
store '01' (less than 28.57%)? Or what if store '01' actually sold most of the product (more than 28.57%)?
Likewise, the 2.86% of products sold using promotion type 1 shown in Table 72 on page 397 can be
misleading. The actual percentage in DAILY_SALES could very well be a different figure than the projected
one.
You can use statistical views to help the optimizer correct its estimates. First, you need to create two
statistical views representing each semi-join in the previous query. The first statistical view provides the
distribution of stores for all daily sales. The second statistical view represents the distribution of
promotion types for all daily sales. Note that each statistical view can provide the distribution information
for any particular store number or promotion type. In this example, you use a 10% sample rate to retrieve
the records in DAILY_SALES for the respective views and save them in global temporary tables. You can
then query those tables to collect the necessary statistics to update the two statistical views.
1. Create a view representing the join of STORE with DAILY_SALES.
3. Make the views statistical views by enabling them for query optimization:
5. Run the query again so that it can be re-optimized. Upon reoptimization, the optimizer will match
SV_STORE_DAILYSALES and SV_PROMOTION_DAILYSALES with the query, and will use the view
statistics to adjust the cardinality estimate of the semi-joins between the fact and dimension tables,
causing a reversal of the original order of the semi-joins chosen without these statistics. The new plan
is as follows:
1.04627e+07
IXAND
( 8)
/------------------+------------------\
6.99152e+07 1.12845e+08
NLJOIN NLJOIN
( 9) ( 13)
/---------+--------\ /---------+--------\
18 3.88418e+06 1 1.12845e+08
FETCH IXSCAN FETCH IXSCAN
( 10) ( 12) ( 14) ( 16)
/---+---\ | /---+---\ |
18 63 7.54069e+08 35 35 7.54069e+08
IXSCAN TABLE:DB2DBA INDEX: DB2DBA IXSCAN TABLE: DB2DBA INDEX: DB2DBA
DB2DBA
( 11) STORE STORE_FK_IDX ( 15) PROMOTION PROMO_FK_IDX
| |
63 35
INDEX: DB2DBA INDEX: DB2DBA
STOREX1 PROMOTION_PK_IDX
Table 73 on page 398 summarizes the cardinality estimates and their percentage (the filtering effect)
before and after the join for each semi-join.
Table 73. Cardinality estimates before and after joining with DAILY_SALES.
Before Join After Join (no statistical views) After Join (with statistical views)
Note that this time, the semi-join between STORE and DAILY_SALES is performed on the outer leg of the
index ANDing plan. This is because the two statistical views essentially tell the optimizer that the
predicate store_number = '01' will filter more rows than promotype = 1. This time, the optimizer
estimates that there are approximately 10 462 700 products sold. This estimate is off by a factor of 1.23
(12 889 514 ÷ 10 462 700 ≈ 1.23), which is a significant improvement over the estimate without
statistical views (in Table 72 on page 397).
The query optimizer can use the statistics from a statistical view for these types of queries to obtain
better access plans.
To obtain statistics for these types of queries, one side of the predicate must be an expression that
matches an expression in the statistical view column definition exactly.
Here are some examples where the query optimizer does not use the statistics from a statistical view:
• One side of the predicate in the query is an expression that is matched with more than one expression
column in a statistical view:
create view SV14(c1, c2) as (select c1+c2, c1*c2 from t1 where c1 > 3);
alter view SV14 enable query optimization;
runstats on table schema.sv1 with distribution;
select * from t1 where (c1+c2) + (c1*c2) > 5 and c1 > 3;
Here the expression (c1+c2) + (c1*c2) matched to two columns in view SV14. The statistics of
view SV14 for this expression are not used.
• One side of the predicate in the query is an expression that is partially matched with an expression
column in a statistical view:
create view SV15(c1, c2) as (select c1+c2, c1*c2 from t1 where c1 > 3);
alter view SV15 enable query optimization;
runstats on table schema.SV15 with distribution;
select * from t1 where (c1+c2) + 10 > 5 and c1 > 3;
Here the expression (c1+c2) + 10 is partially matched to c1+c2 in view SV15. The statistics of view
SV15 for this expression are not used.
• One side of the predicate in the query is indirectly matched to an expression column in a statistical
view:
create view SV16(c1, c2) as (select c1+c2, c1*c2 from t1 where c1 > 3);
alter view SV16 enable query optimization;
runstats on table schema.SV16 with distribution;
select * from t3 left join table (select ta.c1 from t2 left join table
(select c1+c2,c3 from t1 where c1 > 3) as ta(c1,c3) on t2.c1 = ta.c3) as
tb(c1) on t3.c1= TB.C1;
Here the column TB.C1 indirectly matches the expression c1+c2 in view SV16. The statistics of view
SV16 for this expression are not used.
Also, consider that you want to provide statistics for the following query:
select distinct * from F, D1, D2, D3 where F_FK1 = D1_PK and F_FK2
= D2_PK and F_FK3 = D3_PK and D1_C1='ON' and D2_C2>='2009-01-01';
To gather accurate statistics you can create the complete set of views, as follows:
create view SV4 as(select D1.*, D2.*, D3.* from F, D1, D2, D3 where
F_FK1 = D1_PK and F_FK2 = D2_PK and F_FK3 = D3_PK);
alter view SV4 enable query optimization;
You can reduce the number of statistical views created to obtain accurate statistics if referential integrity
constraints exist between join columns. This reduction in the number of statistical views needed, saves
you time in creating, updating, and maintaining statistical views. For this example, the following single
statistical view would be sufficient to obtain the same statistics as the complete set of statistical views
created earlier:
create view SV5 as (select D1.*, D2.*, D3.*, D4.*, D5.* from F, D1, D2, D3, D4, D5
where
F_FK1 = D1_PK and F_FK2 = D2_PK and F_FK3 = D3_PK
and F_FK4 = D4_PK and F_FK5 = D5_PK
);
alter view SV5 enable query optimization;
The statistics for SV4, SV3, SV2, and SV1 are inferred from SV5 based on referential integrity constraints.
The referential integrity constraints between F, D1, D2, D3, D4, and D5 ensure that the joins among them
are lossless. These lossless joins let us infer that the following cardinalities are the same:
• The cardinality of the join result between F and D1
• The cardinality of SV1
• The cardinality of the statistical view SV5
This query might run slowly and the cardinality estimate can be inaccurate. If you check the access plan,
you might find that there is a strong correlation between T1.C2 and T2.D3 although the cardinality
estimate has been adjusted by the statistical view. Therefore, the cardinality estimate is inaccurate.
To resolve this situation, you can collect column group statistics of the view SV2 by issuing the following
command:
These additional statistics help improve the cardinality estimates which might result in better access
plans.
Note: Collecting distribution statistics for a list of columns in a statistical view is not supported.
Collecting column group statistics on statistical views can also be used to compute the number of distinct
groupings, or the grouping key cardinality, for queries that require data to be grouped in a certain way. A
grouping requirement can result from operations such as the GROUP BY or DISTINCT operations.
For example, consider the following query and statistical view:
select T1.C1, T1.C2 from T1,T2 where T1.C3=T2.C3 group by T1.C1, T1.C2;
create view SV2 as (select T1.C1, T1.C2 from T1,T2 where T1.C3=T2.C3);
alter view SV2 enable query optimization;
Collecting column group statistics on the statistical view covering the join predicate helps the optimizer
estimate the grouping key cardinality more accurately. Issue the following command to collect the
column group statistics:
Note:
1. If the table has no indexes defined and you request statistics for indexes, no CARD statistics are
updated. The previous CARD statistics are retained.
Note:
1. Column statistics are collected for the first column in the index key.
2. These statistics provide information about data in columns that contain a series of sub-fields or sub-
elements that are delimited by blanks. The SUB_COUNT and SUB_DELIM_LENGTH statistics are
collected only for columns of type CHAR and VARCHAR with a code page attribute of single-byte
character set (SBCS), FOR BIT DATA, or UTF-8.
Note:
1. Detailed index statistics are collected by specifying the DETAILED clause on the RUNSTATS
command.
2. CLUSTERFACTOR and PAGE_FETCH_PAIRS are not collected with the DETAILED clause unless the
table is of sufficient size (greater than about 25 pages). In this case, CLUSTERRATIO is -1 (not
collected). If the table is relatively small, only CLUSTERRATIO is collected by the RUNSTATS utility;
CLUSTERFACTOR and PAGE_FETCH_PAIRS are not collected. If the DETAILED clause is not
specified, only CLUSTERRATIO is collected.
3. This statistic measures the percentage of pages in the file containing the index that belongs to that
table. For a table with only one index defined on it, DENSITY should be 100. DENSITY is used by the
optimizer to estimate how many irrelevant pages from other indexes might be read, on average, if the
index pages were prefetched.
4. This statistic cannot be computed when the table is in a DMS table space.
5. Prefetch statistics are not collected during a load or create index operation, even if statistics
collection is specified when the command is invoked. Prefetch statistics are also not collected if the
seqdetect database configuration parameter is set to NO.
6. When RUNSTATS options for table is "No", statistics are not collected when table statistics are
collected; when RUNSTATS options for indexes is "Yes", statistics are collected when the RUNSTATS
command is used with the INDEXES options.
Note:
1. Column distribution statistics are collected by specifying the WITH DISTRIBUTION clause on the
RUNSTATS command. Distribution statistics cannot be collected unless there is a sufficient lack of
uniformity in the column values.
2. DISTCOUNT is collected only for columns that are the first key column in an index.
At most, two asynchronous requests can be processed at the same time, and only for different tables.
One request must have been initiated by real-time statistics collection, and the other must have been
initiated by asynchronous statistics collection.
The performance impact of automatic statistics collection is minimized in several ways:
• Asynchronous statistics collection is performed by using a throttled RUNSTATS utility. Throttling
controls the amount of resource that is consumed by the RUNSTATS utility, based on current database
activity. As database activity increases, the utility runs more slowly, reducing its resource demands.
• Synchronous statistics collection is limited to 5 seconds per query. The RTS optimization guideline
determines the amount of time. If synchronous collection exceeds the time limit, an asynchronous
collection request is submitted.
• Synchronous statistics collection does not store the statistics in the system catalog. Instead, the
statistics are stored in a statistics cache and are later stored in the system catalog by an asynchronous
operation. This storage sequence avoids the memory usage and possible lock contention that are
involved in updating the system catalog. Statistics in the statistics cache are available for subsequent
SQL compilation requests.
• Only one synchronous statistics collection operation occurs per table. Other requests requiring
synchronous statistics collection fabricate statistics, if possible, and continue with statement
compilation. This behavior is also enforced in a partitioned database environment, where operations on
different database partitions might require synchronous statistics.
•
• Only tables with missing statistics or high levels of activity (as measured by the number of update,
insert, or delete operations) are considered for statistics collection. Even if a table meets the statistics
collection criteria, statistics are not collected synchronously unless query optimization requires them.
In some cases, the query optimizer can choose an access plan without statistics. To check if
asynchronous statistics collection is required, tables with more than 4000 pages are sampled to
determine whether high table activity changed the statistics. Statistics for such large tables are
collected only if warranted.
• Statistics collection during an online maintenance window depends on whether the statistics are
asynchronous or synchronous:
– For asynchronous statistics collection, the RUNSTATS utility is automatically scheduled to run during
the online maintenance window that you specify in your maintenance policy. This policy also
specifies the set of tables that are within the scope of automatic statistics collection, minimizing
unnecessary resource consumption.
– Synchronous statistics collection and fabrication do not use the online maintenance window that you
specify in your maintenance policy, because synchronous requests must occur immediately and have
Table 81. Real-time statistics collection as a function of the value of the CURRENT EXPLAIN MODE special
register
CURRENT EXPLAIN MODE special register
value Real-time statistics collection considered
YES Yes
EXPLAIN No
NO Yes
REOPT Yes
RECOMMEND INDEXES No
EVALUATE INDEXES No
Procedure
After setting the auto_maint and the auto_tbl_maint database configuration parameters to ON, you
have the following options:
• To enable background statistics collection, set the auto_runstats database configuration
parameter to ON.
• To enable background statistics collection for statistical views, set both the auto_stats_views and
auto_runstats database configuration parameters to ON.
• To enable background statistics collection to use sampling automatically for large tables and
statistical views, also set the auto_sampling database configuration parameter to ON. Use this
setting in addition to auto_runstats (tables only) or to auto_runstats and auto_stats_views
(tables and statistical views).
• To enable real-time statistics collection, set both auto_stmt_stats and auto_runstats database
configuration parameters to ON.
• To enable background statistics collection of column group statistics on base tables, set both the
auto_runstats and auto_cg_stats database configuration parameters to ON. To minimize the
overhead of collecting the extra column group statistics on large tables, enable background statistics
collection to use sampling automatically by setting auto_sampling to ON.
OBJTYPE VARCHAR(64) The type of object to which the event applies. For statistics
logging, this is the type of statistics to be collected. OBJTYPE
can refer to a statistics collection background process when the
process starts or stops. It can also refer to activities that are
performed by automatic statistics collection, such as a
sampling test, initial sampling, and table evaluation.
Possible values for statistics collection activities are:
TABLE STATS
Table statistics are to be collected.
INDEX STATS
Index statistics are to be collected.
TABLE AND INDEX STATS
Both table and index statistics are to be collected.
Possible values for automatic statistics collection are:
EVALUATION
The automatic statistics background collection process
has begun an evaluation phase. During this phase, tables
will be checked to determine if they need updated
statistics, and statistics will be collected, if necessary.
INITIAL SAMPLING
Statistics are being collected for a table using sampling.
The sampled statistics are stored in the system catalog.
This allows automatic statistics collection to proceed
quickly for a table with no statistics. Subsequent
operations will collect statistics without sampling. Initial
sampling is performed during the evaluation phase of
automatic statistics collection.
SAMPLING TEST
Statistics are being collected for a table using sampling.
The sampled statistics are not stored in the system
catalog. The sampled statistics will be compared to the
current catalog statistics to determine if and when full
statistics should be collected for this table. The sampling is
performed during the evaluation phase of automatic
statistics collection.
STATS DAEMON
The statistics daemon is a background process used to
handle requests that are submitted by real-time statistics
processing. This object type is logged when the
background process starts or stops.
COLUMN GROUP STATS
The statistics collection is performing a discovery to
identify the column group statistics to collect.
OBJNAME VARCHAR(255) The name of the object to which the event applies, if available.
For statistics logging, this is the table or index name. If
OBJTYPE is STATS DAEMON or EVALUATION, OBJNAME is the
database name and OBJNAME_QUALIFIER is NULL.
OBJNAME_QUALIFIER VARCHAR(255) For statistics logging, this is the schema of the table or index.
EVENTTYPE VARCHAR(24) The event type is the action that is associated with this event.
Possible values for statistics logging are:
COLLECT
This action is logged for a statistics collection operation.
START
This action is logged when the real-time statistics
background process (OBJTYPE = STATS DAEMON) or an
automatic statistics collection evaluation phase (OBJTYPE
= EVALUATION) starts.
STOP
This action is logged when the real-time statistics
background process (OBJTYPE = STATS DAEMON) or an
automatic statistics collection evaluation phase (OBJTYPE
= EVALUATION stops.
ACCESS
This action is logged when an attempt has been made to
access a table for statistics collection purposes. This event
type is used to log an unsuccessful access attempt when
the object is unavailable.
WRITE
This action is logged when previously collected statistics
that are stored in the statistics cache are written to the
system catalog.
DISCOVER
This action is logged for a statistics discovery operation.
FIRST_EVENTQUALIFIERTYPE VARCHAR(64) The type of the first event qualifier. Event qualifiers are used to
describe what was affected by the event. For statistics logging,
the first event qualifier is the timestamp for when the event
occurred. For the first event qualifier type, the value is AT.
FIRST_EVENTQUALIFIER CLOB(16k) The first qualifier for the event. For statistics logging, the first
event qualifier is the timestamp for when the statistics event
occurred. The timestamp of the statistics event might be
different than the timestamp of the log record, as represented
by the TIMESTAMP column.
SECOND_EVENTQUALIFIERTYPE VARCHAR(64) The type of the second event qualifier. For statistics logging, the
value can be BY or NULL. This field is not used for other event
types.
SECOND_EVENTQUALIFIER CLOB(16k) The second qualifier for the event. For statistics logging, this
represents how statistics were collected for COLLECT event
types. Possible values are:
User
Statistics collection was performed by a Db2 user invoking
the LOAD, REDISTRIBUTE, or RUNSTATS command, or
issuing the CREATE INDEX statement.
Synchronous
Statistics collection was performed at SQL statement
compilation time by the Db2 server. The statistics are
stored in the statistics cache but not the system catalog.
Synchronous sampled
Statistics collection was performed using sampling at SQL
statement compilation time by the Db2 server. The
statistics are stored in the statistics cache but not the
system catalog.
Fabricate
Statistics were fabricated at SQL statement compilation
time using information that is maintained by the data and
index manager. The statistics are stored in the statistics
cache but not the system catalog.
Fabricate partial
Only some statistics were fabricated at SQL statement
compilation time using information that is maintained by
the data and index manager. In particular, only the
HIGH2KEY and LOW2KEY values for certain columns were
fabricated. The statistics are stored in the statistics cache
but not the system catalog.
Asynchronous
Statistics were collected by a Db2 background process and
are stored in the system catalog.
This field is not used for other event types.
THIRD_EVENTQUALIFIERTYPE VARCHAR(64) The type of the third event qualifier. For statistics logging, the
value can be DUE TO or NULL.
THIRD_EVENTQUALIFIER CLOB(16k) The third qualifier for the event. For statistics logging, this
represents the reason why a statistics activity could not be
completed. Possible values are:
Timeout
Synchronous statistics collection exceeded the time
budget.
Error
The statistics activity failed due to an error.
RUNSTATS error
Synchronous statistics collection failed due to a RUNSTATS
error. For some errors, SQL statement compilation might
have completed successfully, even though statistics could
not be collected. For example, if there was insufficient
memory to collect statistics, SQL statement compilation
will continue.
Object unavailable
Statistics could not be collected for the database object
because it could not be accessed. Some possible reasons
include:
• The object is locked in super exclusive (Z) mode
• The table space in which the object resides is
unavailable
• The table indexes need to be recreated
Conflict
Synchronous statistics collection was not performed
because another application was already collecting
synchronous statistics.
Check the FULLREC column or the db2diag log files for the
error details.
EVENTSTATE VARCHAR(255) State of the object or action as a result of the event. For
statistics logging, this indicates the state of the statistics
operation. Possible values are:
• Start
• Success
• Failure
Examples
In this example, the query returns statistics log records for events up to one year prior to the current
timestamp by invoking PD_GET_DIAG_HIST.
The results are ordered by the timestamp stored in the FIRST_EVENTQUALIFIER column, which
represents the time of the statistics event.
PID TID EVENTTYPE OBJTYPE OBJSCHEMA OBJNAME EVENT1 EVENT2_ EVENT2 EVENT3_
EVENT3 EVENTSTATE
Procedure
1. Create a table with appropriate columns for the log records.
Examples
The following conditions apply to all the examples:
• Table TBL1 is created with expression-based index IND1.
• Associated statistical view IND1_V is automatically created.
Example 1
The RUNSTATS command is issued on table TBL1 in two ways:
runstats on table TBL1 with distribution on key columns and index IND2
The results in both cases are the same: the index statistics for IND1 are not updated, and the statistical
view statistics for IND1_V are not updated, even though the ON KEY COLUMNS parameter was specified
in the second case. (Specifying the ALL COLUMNS parameter would not change the results, either.) To
gather statistics on expression-based key columns in an expression-based index, you must explicitly or
implicitly include that index in the RUNSTATS command.
Example 2
The RUNSTATS command is issued on table TBL1 in three ways:
• Index IND1 is specified:
In all of these cases, the index statistics for IND1 are updated. As well, the statistical view statistics for
IND1_V are updated with basic column statistics for all expression columns. These results apply even
though the ON COLUMNS AND clause was specified.
Examples
The following initial conditions apply to the examples:
• Table TBL1 has no statistics profile.
• Table TBL1 has an expression-based index IND1.
• Index IND1 has an automatically generated statistical view IND1_V.
• The NUM_FREQVALUES database configuration parameter is set to the default value of 10.
Example 1
The following command sets a statistics profile on the statistical view IND1_V, which gathers extra
distribution statistics:
RUNSTATS ON VIEW IND1_V WITH DISTRIBUTION DEFAULT NUM_FREQVALUES 40 SET PROFILE ONLY
Because there is no statistics profile on the table, a statistics profile is generated when the command is
issued in the following form:
RUNSTATS ON TABLE TBL1 WITH DISTRIBUTION AND SAMPLED DETAILED INDEXES ALL
The statistics profile on the table and the statistics profile on the statistic view are applied. Statistics are
gathered for the table and the expression columns in the index. The columns in the table have the usual
amount of frequent value statistics and the columns in the statistical view have more frequent value
statistics.
Example 2
The following command sets a statistics profile that samples a large table with the aim to shorten the
execution time for the RUNSTATS command and to reduce the impact on the overall system (the profile is
set, but the command to sample the table is not issued):
RUNSTATS ON TABLE TBL1 WITH DISTRIBUTION AND INDEXES ALL TABLESAMPLE SYSTEM (2.5) INDEXSAMPLE SYSTEM (10) SET PROFILE ONLY
UTIL_IMPACT_PRIORITY 30
It is decided that distribution statistics are not needed on the expression-based keys in index IND1.
However, LIKE statistics are required on the second key column in the index. According to the definition
for statistical view IND1_V in the catalogs, the second column in the view is named K01.
RUNSTATS ON VIEW IND1_V ON ALL COLUMNS AND COLUMNS(K01 LIKE STATISTICS) SET PROFILE
ONLY
Now the statistics profile is set and the following command is issued to gather statistics that are based on
the statistics profile:
The statistics profile on the table and the statistics profile on the statistic view are applied. Statistics are
gathered for the table and the expression-based columns. The results are as follows:
• Distribution statistics are gathered for the base table columns but not for the expression-based key
columns.
• The LIKE statistics are gathered for the specified expression-based key column.
• While the RUNSTATS command is running, the expression-based key column values are sampled at the
rate dictated by the INDEXSAMPLE SYSTEM(10) parameter in the table's profile.
• The table's UTIL_IMPACT_PRIORITY parameter setting governs the priority of the entire RUNSTATS
command operation.
Procedure
To collect catalog statistics:
1. Connect to the database that contains the tables, indexes, or statistical views for which you want to
collect statistical information.
2. Collect statistics for queries that run against the tables, indexes, or statistical views by using one of the
following methods:
• From the Db2 command line, execute the RUNSTATS command with appropriate options. These
options enable you to tailor the statistics that are collected for queries that run against the tables,
indexes, or statistical views.
• From IBM Data Studio, open the task assistant for the RUNSTATS command.
3. When the runstats operation completes, issue a COMMIT statement to release locks.
4. Rebind any packages that access the tables, indexes, or statistical views for which you have updated
statistical information.
Results
Note:
1. The RUNSTATS command does not support the use of nicknames. If queries access a federated
database, use RUNSTATS to update statistics for tables in all databases, then drop and recreate the
nicknames that access remote tables to make the new statistics available to the optimizer.
2. When you collect statistics for a table in a partitioned database environment, RUNSTATS only operates
on the database partition from which the utility is executed. The results from this database partition
are extrapolated to the other database partitions. If this database partition does not contain a required
portion of the table, the request is sent to the first database partition in the database partition group
that contains the required data.
Statistics for a statistical view are collected on all database partitions containing base tables that are
referenced by the view.
3. For Db2 V9.7 Fix Pack 1 and later releases, the following apply to the collection of distribution
statistics on a column of type XML:
Sub-element statistics
If you specify LIKE predicates using the % wildcard character in any position other than at the end of the
pattern, you should collect basic information about the sub-element structure.
As well as the wildcard LIKE predicate (for example, SELECT...FROM DOCUMENTS WHERE KEYWORDS
LIKE '%simulation%'), the columns and the query must fit certain criteria to benefit from sub-
element statistics.
Table columns should contain sub-fields or sub-elements separated by blanks. For example, a four-row
table DOCUMENTS contains a KEYWORDS column with lists of relevant keywords for text retrieval
purposes. The values in KEYWORDS are:
In this example, each column value consists of five sub-elements, each of which is a word (the keyword),
separated from the others by one blank.
The query should reference these columns in WHERE clauses.
The optimizer always estimates how many rows match each predicate. For these wildcard LIKE
predicates, the optimizer assumes that the column being matched contains a series of elements
concatenated together, and it estimates the length of each element based on the length of the string,
excluding leading and trailing % characters. If you collect sub-element statistics, the optimizer will have
information about the length of each sub-element and the delimiter. It can use this additional information
to more accurately estimate how many rows will match the predicate.
To collect sub-element statistics, execute the RUNSTATS command with the LIKE STATISTICS
parameter.
The RUNSTATS utility might take longer to complete if you use the LIKE STATISTICS clause. If you are
considering this option, assess the improvements in query performance against this additional overhead.
Procedure
To collect detailed statistics for an index:
1. Connect to the SALES database.
2. Execute one of the following commands from the Db2 command line, depending on your
requirements:
• To collect detailed statistics on both CUSTIDX1 and CUSTIDX2:
• To collect detailed statistics on both indexes, but with sampling instead of detailed calculations on
each index entry:
The SAMPLED DETAILED parameter requires 2 MB of the statistics heap. Allocate an additional
488 4-KB pages to the stat_heap_sz database configuration parameter setting for this memory
requirement. If the heap is too small, the RUNSTATS utility returns an error before it attempts to
collect statistics.
• To collect detailed statistics on sampled indexes, as well as distribution statistics for the table so
that index and table statistics are consistent:
PAGE_FETCH_PAIRS =
'100 380 120 360 140 340 160 330 180 320 200 310 220 305 240 300
260 300 280 300 300 300'
where
NPAGES = 300
CARD = 10000
CLUSTERRATIO = -1
CLUSTERFACTOR = 0.9
• For indexes over XML data, the relationship among FIRSTKEYCARD, FIRST2KEYCARD,
FIRST3KEYCARD, FIRST4KEYCARD, FULLKEYCARD, and INDCARD must be as follows:
where c1 = key;
where c1 in (key1, key2, key3);
where (c1 = key1) or (c1 = key2) or (c1 = key3);
where c1 <= key;
where c1 between key1 and key2;
Two types of nonuniform data distribution can occur, and possibly together.
• Data might be highly clustered instead of being evenly spread out between the highest and lowest data
value. Consider the following column, in which the data is clustered in the range (5,10):
0.0
5.1
6.3
7.1
8.2
8.4
8.5
9.1
93.6
100.0
Quantile statistics help the optimizer to deal with this kind of data distribution.
Queries can help you to determine whether column data is not uniformly distributed. For example:
• Duplicate data values might often occur. Consider a column in which the data is distributed with the
following frequencies:
Both frequent-value and quantile statistics help the optimizer to deal with numerous duplicate values.
select c1, c2
from table1
where c1 = 'NEW YORK'
and c2 <= 10
Assume that there is an index on both columns C1 and C2. One possible access plan is to use the index on
C1 to retrieve all rows with C1 = 'NEW YORK', and then to check whether C2 <= 10 for each retrieved
row. An alternate plan is to use the index on C2 to retrieve all rows with C2 <= 10, and then to check
whether C1 = 'NEW YORK' for each retrieved row. Because the primary cost of executing a query is
usually the cost of retrieving the rows, the best plan is the one that requires the fewest retrievals.
Choosing this plan means estimating the number of rows that satisfy each predicate.
When distribution statistics are not available, but the runstats utility has been used on a table or a
statistical view, the only information that is available to the optimizer is the second-highest data value
(HIGH2KEY), the second-lowest data value (LOW2KEY), the number of distinct values (COLCARD), and
the number of rows (CARD) in a column. The number of rows that satisfy an equality or range predicate is
estimated under the assumption that the data values in the column have equal frequencies and that the
data values are evenly distributed between LOW2KEY and HIGH2KEY. Specifically, the number of rows
that satisfy an equality predicate (C1 = KEY) is estimated as CARD/COLCARD, and the number of rows
that satisfy a range predicate (C1 BETWEEN KEY1 AND KEY2) can be estimated with the following
formula:
KEY2 - KEY1
------------------ x CARD
HIGH2KEY - LOW2KEY
These estimates are accurate only when the true distribution of data values within a column is reasonably
uniform. When distribution statistics are unavailable, and either the frequency of data values varies
widely, or the data values are very unevenly distributed, the estimates can be off by orders of magnitude,
and the optimizer might choose a suboptimal access plan.
When distribution statistics are available, the probability of such errors can be greatly reduced by using
frequent-value statistics to estimate the number of rows that satisfy an equality predicate, and by using
both frequent-value statistics and quantile statistics to estimate the number of rows that satisfy a range
predicate.
Procedure
To collect statistics on specific columns:
1. Connect to the SALES database.
2. Execute one of the following commands from the Db2 command line, depending on your
requirements:
• To collect distribution statistics on columns ZIP and YTDTOTAL:
• To collect distribution statistics on the same columns, but with different distribution options:
• To collect distribution statistics on the columns that are indexed in CUSTIDX1 and CUSTIDX2:
• To collect statistics for columns ZIP and YTDTOTAL and a column group that includes REGION and
TERRITORY:
• Suppose that statistics for non-XML columns were collected previously using the LOAD command
with the STATISTICS parameter. To collect statistics for the XML column MISCINFO:
The EXCLUDING XML COLUMNS clause takes precedence over all other clauses that specify XML
columns.
• For Db2 V9.7 Fix Pack 1 and later releases, the following command collects distribution statistics
using a maximum of 50 quantiles for the XML column MISCINFO. A default of 20 quantiles is used
for all other columns in the table:
Note: The following are required for distribution statistics to be collected on the XML column
MISCINFO:
– Both table and distribution statistics must be collected.
CARD - NUM_FREQ_ROWS
--------------------
COLCARD - N
where CARD is the number of rows in the table, COLCARD is the cardinality of the column, and
NUM_FREQ_ROWS is the total number of rows with a value equal to one of the N most frequent values.
For example, consider a column C1 whose data values exhibit the following frequencies:
The number of rows in the table is 50 and the column cardinality is 5. Exactly 40 rows satisfy the
predicate C1 = 3. If it is assumed that the data is evenly distributed, the optimizer estimates the number
of rows that satisfy the predicate as 50/5 = 10, with an error of -75%. But if frequent-value statistics
based on only the most frequent value (that is, N = 1) are available, the number of rows is estimated as
40, with no error.
Consider another example in which two rows satisfy the predicate C1 = 1. Without frequent-value
statistics, the number of rows that satisfy the predicate is estimated as 10, an error of 400%:
10 - 2
------ X 100 = 400%
2
Using frequent-value statistics (N = 1), the optimizer estimates the number of rows containing this value
using the formula (1) given previously as:
(50 - 40)
--------- = 3
(5 - 1)
3 - 2
----- = 50%
2
0.0
5.1
6.3
7.1
8.2
8.4
8.5
9.1
93.6
100.0
K K-quantile
1 0.0
4 7.1
7 8.5
10 100.0
• Exactly seven rows satisfy the predicate C <= 8.5. Assuming a uniform data distribution, the following
formula (2):
KEY2 - KEY1
------------------ X CARD
HIGH2KEY - LOW2KEY
with LOW2KEY in place of KEY1, estimates the number of rows that satisfy the predicate as:
8.5 - 5.1
---------- X 10 ≈ 0
93.6 - 5.1
where ≈ means "approximately equal to". The error in this estimate is approximately -100%.
If quantile statistics are available, the optimizer estimates the number of rows that satisfy this
predicate by the value of K that corresponds to 8.5 (the highest value in one of the quantiles), which is
7. In this case, the error is reduced to 0.
• Exactly eight rows satisfy the predicate C <= 10. If the optimizer assumes a uniform data distribution
and uses formula (2), the number of rows that satisfy the predicate is estimated as 1, an error of
-87.5%.
Unlike the previous example, the value 10 is not one of the stored K-quantiles. However, the optimizer
can use quantiles to estimate the number of rows that satisfy the predicate as r_1 + r_2, where r_1
is the number of rows satisfying the predicate C <= 8.5 and r_2 is the number of rows satisfying the
10 - 8.5
r_2 ≈ ---------- X (number of rows with value > 8.5 and <= 100.0)
100 - 8.5
10 - 8.5
r_2 ≈ ---------- X (10 - 7)
100 - 8.5
1.5
r_2 ≈ ----- X (3)
91.5
r_2 ≈ 0
The final estimate is r_1 + r_2 ≈ 7, and the error is only -12.5%.
Quantiles improve the accuracy of the estimates in these examples because the real data values are
"clustered" in a range from 5 to 10, but the standard estimation formulas assume that the data values are
distributed evenly between 0 and 100.
The use of quantiles also improves accuracy when there are significant differences in the frequencies of
different data values. Consider a column having data values with the following frequencies:
Suppose that K-quantiles are available for K = 5, 25, 75, 95, and 100:
K K-quantile
5 20
25 40
75 50
95 70
100 80
Suppose also that frequent-value statistics are available, based on the three most frequent values.
Exactly 10 rows satisfy the predicate C BETWEEN 20 AND 30. Assuming a uniform data distribution and
using formula (2), the number of rows that satisfy the predicate is estimated as:
30 - 20
------- X 100 = 25
70 - 30
an error of 150%.
Using frequent-value statistics and quantile statistics, the number of rows that satisfy the predicate is
estimated as r_1 + r_2, where r_1 is the number of rows that satisfy the predicate (C = 20) and r_2
100 - 80
-------- = 5
7 - 3
30 - 20
------- X (number of rows with a value > 20 and <= 40)
40 - 20
30 - 20
= ------- X (25 - 5)
40 - 20
= 10
This yields a final estimate of 15 and reduces the error by a factor of three.
For example, consider EU_SHOE, a UDF that converts an American shoe size to the equivalent European
shoe size. For this UDF, you might set the values of statistic columns in SYSSTAT.ROUTINES as follows:
• INSTS_PER_INVOC. Set to the estimated number of machine instructions required to:
– Call EU_SHOE
– Initialize the output string
– Return the result
• INSTS_PER_ARGBYTE. Set to the estimated number of machine instructions required to convert the
input string into a European shoe size
• PERCENT_ARGBYTES. Set to 100, indicating that the entire input string is to be converted
• INITIAL_INSTS, IOS_PER_INVOC, IOS_PER_ARGBYTE, and INITIAL_IOS. Each set to 0, because this
UDF only performs computations
PERCENT_ARGBYTES would be used by a function that does not always process the entire input string.
For example, consider LOCATE, a UDF that accepts two arguments as input and returns the starting
position of the first occurrence of the first argument within the second argument. Assume that the length
of the first argument is small enough to be insignificant relative to the second argument and that, on
average, 75% of the second argument is searched. Based on this information and the following
assumptions, PERCENT_ARGBYTES should be set to 75:
• Half the time the first argument is not found, which results in searching the entire second argument
• The first argument is equally likely to appear anywhere within the second argument, which results in
searching half of the second argument (on average) when the first argument is found
You can use INITIAL_INSTS or INITIAL_IOS to record the estimated number of machine instructions or
read or write requests that are performed the first or last time that a function is invoked; this might
represent the cost, for example, of setting up a scratchpad area.
To obtain information about I/Os and the instructions that are used by a UDF, use output provided by your
programming language compiler or by monitoring tools that are available for your operating system.
Procedure
Issue the LIST UTILITIES command and specify the SHOW DETAIL parameter:
db2pd -runstats
Results
The following is an example of the output for monitoring the performance of a RUNSTATS operation using
the LIST UTILITIES command:
ID = 7
Type = RUNSTATS
Database Name = SAMPLE
Partition Number = 0
Description = YIWEIANG.EMPLOYEE
Start Time = 08/04/2011 12:39:35.155398
State = Executing
Invocation Type = User
Throttling:
Priority = Unthrottled
The following is an example of the output for monitoring the performance of a RUNSTATS operation using
the db2pd command:
db2pd -runstats
update sysstat.tables
set
card = 10000,
npages = 1000,
fpages = 1000,
overflow = 2
where tabschema = 'MELNYK'
and tabname = 'EMPLOYEE'
Care must be taken when manually updating catalog statistics. Arbitrary changes can seriously alter the
performance of subsequent queries. You can use any of the following methods to return the statistics on
your development system to a consistent state:
• Roll back the unit of work in which your manual changes were made (assuming that the unit of work has
not yet been committed).
• Use the runstats utility to refresh the catalog statistics.
• Update the catalog statistics to specify that statistics have not been collected; for example, setting the
NPAGES column value to -1 indicates that this statistic has not been collected.
• Undo the changes that you made. This method is possible only if you used the db2look command to
capture the statistics before you made any changes.
If it determines that some value or combination of values is not valid, the optimizer will use default values
and return a warning. This is quite rare, however, because most validation is performed when the
statistics are updated.
<column>
<name>COLNAME100<⁄name>
<colcard>55000<⁄colcard>
<high2key>49999<⁄high2key>
<low2key>100<⁄low2key>
<⁄column>
<⁄colstats>
You can save Design Advisor recommendations to a file using the -o parameter on the db2advis
command. The saved Design Advisor output consists of the following elements:
• CREATE statements associated with any new indexes, MQTs, MDC tables, or database partitioning
strategies
• REFRESH statements for MQTs
• RUNSTATS commands for new objects
An example of this output is as follows:
--<?xml version="1.0"?>
--<design-advisor>
--<mqt>
--<identifier>
--<name>MQT612152202220000</name>
--<schema>ZILIO2 </schema>
--</identifier>
--<statementlist>3</statementlist>
--<benefit>1013562.481682</benefit>
--<overhead>1468328.200000</overhead>
--<diskspace>0.004906</diskspace>
--</mqt>
.....
--<index>
--<identifier>
--<name>IDX612152221400000</name>
--<schema>ZILIO2 </schema>
--</identifier>
--<table><identifier>
--<name>PART</name>
--<schema>SAMP </schema>
This XML structure can contain more than one column. The column cardinality (that is, the number of
values in each column) is included and, optionally, the HIGH2KEY and LOW2KEY values.
The base table on which an index is defined is also included. Ranking of indexes and MQTs can be done
using the benefit value. You can also rank indexes using (benefit - overhead) and MQTs using (benefit -
0.5 * overhead).
Following the list of indexes and MQTs is the list of statements in the workload, including the SQL text, the
statement number for the statement, the estimated performance improvement (benefit) from the
recommendations, and the list of tables, indexes, and MQTs that were used by the statement. The original
spacing in the SQL text is preserved in this output example, but the SQL text is normally split into 80
character commented lines for increased readability.
Existing indexes or MQTs are included in the output if they are being used to execute a workload.
MDC and database partitioning recommendations are not explicitly shown in this XML output example.
After some minor modifications, you can run this output file as a CLP script to create the recommended
objects. The modifications that you might want to perform include:
• Combining all of the RUNSTATS commands into a single RUNSTATS invocation against the new or
modified objects
• Providing more usable object names in place of system-generated IDs
Procedure
1. Define your workload.
See "Defining a workload for the Design Advisor".
2. Run the db2advis command against this workload.
Note: If the statistics on your database are not current, the generated recommendations are less
reliable.
3. Interpret the output from db2advis and make any necessary modifications.
4. Implement the Design Advisor recommendations, as appropriate.
Procedure
• To run the Design Advisor against dynamic SQL statements:
a) Reset the database monitor with the following command:
b) Wait for an appropriate amount of time to allow for the execution of dynamic SQL statements
against the database.
c) Invoke the db2advis command using the -g parameter. If you want to save the dynamic SQL
statements in the ADVISE_WORKLOAD table for later reference, use the -p parameter as well.
• To run the Design Advisor against a set of SQL statements in a workload file:
In addition to making suggestions about new indexes, materialized query tables (MQTs), and
multidimensional clustering (MDC) tables, the Design Advisor can provide you with suggestions for
distributing data.
Procedure
1. Use the db2licm command to register the partitioned database environment license key.
2. Create at least one table space in a multi-partition database partition group.
Note: The Design Advisor can only suggest data redistribution to existing table spaces.
3. Run the Design Advisor with the partitioning option specified on the db2advis command.
4. Modify the db2advis output file slightly before running the DDL statements that were generated by
the Design Advisor.
Because database partitioning must be set up before you can run the DDL script that the Design
Advisor generates, suggestions are commented out of the script that is returned. It is up to you to
transform your tables in accordance with the suggestions.
Restrictions on index
• Indexes that are suggested for MQTs are designed to improve workload performance, not MQT refresh
performance.
• A clustering RID index is for MDC tables. The Design Advisor will include clustering RID indexes as an
option rather than create an MDC structure for the table.
• The Version 9.7 Design Advisor does not suggest you use partitioned indexes on a partitioned table. All
indexes are must be used with an explicit NOT PARTITIONED clause.
Restrictions on MQT
• The Design Advisor does not suggest the use of incremental MQTs. If you want to create incremental
MQTs, you can convert REFRESH IMMEDIATE MQTs into incremental MQTs with your choice of staging
tables.
Restrictions on MDC
• An existing table must be populated with sufficient data before the Design Advisor considers MDC for
the table. A minimum of twenty to thirty megabytes of data is suggested. Tables that are smaller than
12 extents are excluded from consideration.
• MDC requirements for new MQTs will not be considered unless the sampling option, -r, is used with the
db2advis command.
• The Design Advisor does not make MDC suggestions for typed, temporary, or federated tables.
• Sufficient storage space (approximately 1% of the table data for large tables) must be available for the
sampling data that is used during the execution of the db2advis command.
• Tables that have not had statistics collected are excluded from consideration.
• The Design Advisor does not make suggestions for multicolumn dimensions.
Additional restrictions
Temporary simulation catalog tables are created when the Design Advisor runs. An incomplete run can
result in some of these tables not being dropped. In this situation, you can use the Design Advisor to drop
these tables by restarting the utility. To remove the simulation catalog tables, specify both the -f option
and the -n option (for -n, specifying the same user name that was used for the incomplete execution). If
you do not specify the -f option, the Design Advisor will only generate the DROP statements that are
required to remove the tables; it will not actually remove them.
Note: As of Version 9.5, the -f option is the default. This means that if you run db2advis with the MQT
selection, the database manager automatically drops all local simulation catalog tables using the same
user ID as the schema name.
You should create a separate table space on the catalog database partition for storing these simulated
catalog tables, and set the DROPPED TABLE RECOVERY option on the CREATE or ALTER TABLESPACE
statement to OFF. This enables easier cleanup and faster Design Advisor execution.
Troubleshooting tools
Tools are available to help collect, format or analyze diagnostic data.
Procedure
To check your archive log files, you issue the db2cklog command from the command line and include
the log file or files you want checked. Note that you do not specify full log file names with the db2cklog
command but only the numeric identifiers that are part of the log file names. The numeric identifier of the
S0000001.LOG log file is 1, for example; to check that log file, you specify db2cklog 1. If the archive
log files are not in the current directory, include the relative or absolute path to the log files with the
optional ARCHLOGPATH parameter.
1. If you want to check the validity of a single archive log file, you specify the numeric identifier of that log
file as log-file-number1 with the command. For example, to check the validity of the S0000000.LOG
log file in the /home/amytang/tests directory, you issue the command db2cklog 0
ARCHLOGPATH /home/amytang/tests.
2. If you want to check the validity of a range of archive log files, you include the first and last numeric
identifier of that range with the command (from log-file-number1 to log-file-number2). All log files in
the range are checked, unless the upper end of the range specified with log-file-number2 is
numerically lower than the beginning of the range (specified with log-file-number1). In that case, only
log-file-number1 is checked. For example, to check the validity of the log files ranging from
S0000000.LOG to S0000005.LOG in the /home/nrichers/tests directory, you issue the
command db2cklog 0 TO 5 ARCHLOGPATH /home/nrichers/tests
Results
The db2cklog tool will return a return code of zero for any file that passes validation. If a range of
numbered archive log files is specified, the db2cklog tool will read each file in sequence, perform its
checks and issue a return code for each file. The tool stops at the first error it encounters, even if a range
of log files was specified and there are additional files the tool has not yet checked. The DBT message
that is returned when an error is found can provide you with some more information about why an archive
log file failed validation, but you cannot fix an invalid log file. If you receive a DBT warning message that a
Examples
The following example shows the typical output of the db2cklog command as it parses a log file, in this
case S0000002.LOG. This file passes validation with a return code of zero.
$ db2cklog 2
____________________________________________________________________
_____ D B 2 C K L O G _____
____________________________________________________________________
________________________________________________________________________________
========================================================
"db2cklog": Processing log file header of "S0000002.LOG"
What to do next
If an archive log file fails validation, your response depends on whether or not you have a copy of the log
file that can pass validation by the db2cklog tool. If you are not sure whether you have a copy of the log
file, check the setting for the logarchmeth2 configuration parameter, which determines whether your
database server archives a secondary copy of each log file. If you are validating logs as they are being
archive and log mirroring is also configured on your data server, you might still be able to locate a copy of
the log file in the log mirror path, as your data server does not recycle log files immediately after
archiving.
• If you have a copy of the archive log file, use the db2cklog command against that copy. If the copy of
the log file passes validation, replace the log file that cannot be read with the valid copy of the log file.
• If you have only one copy of the archive log file and that copy cannot be validated, the log file is beyond
repair and cannot be used for rollforward recovery purposes. In this case, you must make a full
database backup as soon as possible to establish a new, more recent recovery point that does not
depend on the unusable log file for rollforward recovery.
Table 86. Feature comparison of db2dart and INSPECT for table spaces
Tests performed db2dart INSPECT
SMS table spaces
Check table space files YES NO
Validate contents of internal page YES YES
header fields
DMS table spaces
Check for extent maps pointed at YES NO
by more than one object
Check every extent map page for NO YES
consistency bit errors
Check every space map page for NO YES
consistency bit errors
Table 87. Feature comparison of db2dart and INSPECT for data objects
Tests performed db2dart INSPECT
Check data objects for YES YES
consistency bit errors
Check the contents of special YES NO
control rows
Check the length and position of YES NO
variable length columns
Check the LONG VARCHAR, YES NO
LONG VARGRAPHIC, and large
object (LOB) descriptors in table
rows
Check the summary total pages, NO YES
used pages and free space
percentage
Validate contents of internal page YES YES
header fields
Verify each row record type and YES YES
its length
Verify that rows are not YES YES
overlapping
Table 88. Feature comparison of db2dart and INSPECT for index objects
Tests performed db2dart INSPECT
Check for consistency bit errors YES YES
Check the location and length of YES YES
the index key and whether there
is overlapping
Check the ordering of keys in the YES NO
index
Determine the summary total NO YES
pages and used pages
Validate contents of internal page YES YES
header fields
Verify the uniqueness of unique YES NO
keys
Table 89. Feature comparison of db2dart and INSPECT for block map objects
Tests performed db2dart INSPECT
Check for consistency bit errors YES YES
Determine the summary total NO YES
pages and used pages
Validate contents of internal page YES YES
header fields
Table 90. Feature comparison of db2dart and INSPECT for long field and LOB objects
Tests performed db2dart INSPECT
Check the allocation structures YES YES
Determine the summary total NO YES
pages and used pages (for LOB
objects only)
In addition, the following actions can be performed using the db2dart command:
• Format and dump data pages
• Format and dump index pages
• Format data rows to delimited ASCII
• Mark an index invalid
The INSPECT command cannot be used to perform those actions.
db2diag -g db=SAMPLE
Note that this command could have been written a couple of different ways, including db2diag -l
severe -pid 2200 -n 0,1,2,3. It should also be noted that the -g option specifies case-sensitive
search, so here "Severe" will work but will fail if "severe" is used. These commands would successfully
retrieve db2diag log file records which meet these requirements, such as:
db2diag -time 2006-01-01 -node "0,1,2" -level "Severe, Error" | db2diag -fmt
"Time: %{ts}
Partition: %node Message Level: %{level} \nPid: %{pid} Tid: %{tid}
Instance: %{instance}\nMessage: @{msg}\n"
To display messages from the OPTSTATS facility and filter out records having a level of Severe:
To display messages from all facilities available and filter out records having instance=harmistr and
level=Error:
To display all messages from the OPTSTATS facility having a level of Error and then outputting the
Timestamp and PID field in a specific format:
db2diag -fac optstats -level Error -fmt " Time :%{ts} Pid :%{pid}"
Example 6: Merging diagnostic directory path files from a single host and sorting records by
timestamps
By default, each member and CF log to a different db2diag log file. The following is a list of the three
db2diag log files to merge:
• ~/sqllib/db2dump/DIAG0000/db2diag.log
• ~/sqllib/db2dump/DIAG0001/db2diag.log
• ~/sqllib/db2dump/DIAG0002/db2diag.log
To merge the three diagnostic log files and sort the records according to timestamps, execute the
following command:
db2diag -merge
Example 7: Merging diagnostic directory path files from multiple hosts and database partitions
This example shows how to obtain an output of all the records from all the diagnostic logs and merge the
diagnostic log files from three database partitions on each of two hosts, bower and horton. The
following list shows the six db2diag log files:
• ~/sqllib/db2dump/HOST_bower/DIAG0000/db2diag.log
• ~/sqllib/db2dump/HOST_bower/DIAG0001/db2diag.log
• ~/sqllib/db2dump/HOST_bower/DIAG0002/db2diag.log
• ~/sqllib/db2dump/HOST_horton/DIAG0003/db2diag.log
• ~/sqllib/db2dump/HOST_horton/DIAG0004/db2diag.log
• ~/sqllib/db2dump/HOST_horton/DIAG0005/db2diag.log
To output the records from all six db2diag log files, run the following command:
db2diag -global
To merge all six db2diag log files in the diagnostic data directory path from all three database partitions
on each of the hosts bower and horton and format the output based on the timestamp, execute the
following command:
where /temp/keon is a shared directory, shared by the hosts bower and horton, to store temporary
merged files from each host during processing.
2010-10-08-04.46.02.092192
2010-10-08-04.46.02.092821
2010-10-08-04.46.02.093497
2010-10-08-04.46.02.094431
2010-10-08-04.46.02.095317
2010-10-08-04.46.05.068648
2010-10-08-04.46.05.069212
2010-10-08-04.46.05.069900
2010-10-08-04.46.05.071008
2010-10-08-04.46.05.071831
2010-10-08-04.46.07.302051
2010-10-08-04.46.07.302727
You can also filter recent diagnostic log records further to return only messages of a specific level. For
example, to return only those records in the last 10 records that have a severe message level, enter:
2010-08-11-04.11.33.733807
2010-08-11-04.11.33.735398
$ db2diag -A
db2diag: Moving "/home/usr1/clidriver/db2dump/db2diag.log"
to "/home/usr1/clidriver/db2dump/db2diag.log_2010-09-14-01.16.26"
Note: The following commands can produce the same results on an instance-less client.
If you specify options other than -archive or -A, an error message is returned. For example:
$ db2diag -x
db2diag: Unrecognized option: -x
DB21085I Instance "DB2" uses "32" bits and Db2 code release "SQL09010" with
level identifier "01010107".
Informational tokens are "Db2 v9.1.0.189", "n060119", "", and Fix Pack "0".
Product is installed at "c:\SQLLIB" with Db2 Copy Name "db2build".
The combination of the four informational tokens uniquely identify the precise service level of your Db2
instance. This information is essential when contacting IBM Software Support for assistance.
For JDBC or SQLJ applications, if you are using the IBM Db2 Driver for SQLJ and JDBC, you can determine
the level of the driver by running the db2jcc utility:
db2jcc -version
...
------------------------------------------------
-- DDL Statements for table "DB2"."ORG"
------------------------------------------------
Once you have changed the connect statement, run the statements, as follows:
Take a look at the sample2.out output file -- everything should have been executed successfully. If
errors have occurred, the error messages should state what the problem is. Fix those problems and run
the statements again.
As you can see in the output, DDL for all of the user tables are exported. This is the default behavior but
there are other options available to be more specific about the tables included. For example, to only
include the STAFF and ORG tables, use the -t option:
To only include tables with the schema Db2, use the -z option:
As before, the output file must be edited such that the CONNECT TO SAMPLE statement is changed to
CONNECT TO SAMPLE2. Again, take a look at the rest of the file to see what some of the RUNSTATS and
UPDATE statements contain:
...
-- Mimic table ORG
RUNSTATS ON TABLE "DB2"."ORG" ;
UPDATE SYSSTAT.INDEXES
SET NLEAF=-1,
NLEVELS=-1,
FIRSTKEYCARD=-1,
FIRST2KEYCARD=-1,
FIRST3KEYCARD=-1,
FIRST4KEYCARD=-1,
FULLKEYCARD=-1,
CLUSTERFACTOR=-1,
CLUSTERRATIO=-1,
SEQUENTIAL_PAGES=-1,
PAGE_FETCH_PAIRS='',
DENSITY=-1,
AVERAGE_SEQUENCE_GAP=-1,
AVERAGE_SEQUENCE_FETCH_GAP=-1,
AVERAGE_SEQUENCE_PAGES=-1,
AVERAGE_SEQUENCE_FETCH_PAGES=-1,
AVERAGE_RANDOM_PAGES=-1,
AVERAGE_RANDOM_FETCH_PAGES=-1,
NUMRIDS=-1,
NUMRIDS_DELETED=-1,
NUM_EMPTY_LEAFS=-1
WHERE TABNAME = 'ORG' AND TABSCHEMA = 'DB2 ';
...
As with the -e option that extracts the DDL, the -t and -z options can be used to specify a set of tables.
CONNECT TO SAMPLE;
--------------------------------------------------------
-- Database and Database Manager configuration parameters
--------------------------------------------------------
...
---------------------------------
-- Environment Variables settings
---------------------------------
COMMIT WORK;
CONNECT RESET;
Note: Only those parameters and variables that affect Db2 compiler will be included. If a registry variable
that affects the compiler is set to its default value, it will not show up under "Environment Variables
settings".
Listing Db2 database products installed on your system (Linux and UNIX)
On supported Linux and UNIX operating systems, the db2ls command lists the Db2 database products
and features installed on your system, including the Db2 Version 11.5 HTML documentation.
At least one Db2 Version 9 (or later) database product must already be installed by a root user for a
symbolic link to the db2ls command to be available in the /usr/local/bin directory.
With the ability to install multiple copies of Db2 database products on your system and the flexibility to
install Db2 database products and features in the path of your choice, you need a tool to help you keep
track of what is installed and where it is installed. On supported Linux and UNIX operating systems, the
db2ls command lists the Db2 products and features installed on your system, including the Db2 HTML
documentation.
The db2ls command can be found both in the installation media and in a Db2 install copy on the system.
The db2ls command can be run from either location. The db2ls command can be run from the
installation media for all products except IBM Data Server Driver Package.
The db2ls command can be used to list:
• Where Db2 database products are installed on your system and list the Db2 database product level
• All or specific Db2 database products and features in a particular installation path
Restrictions
The output that the db2ls command lists is different depending on the ID used:
• When the db2ls command is run with root authority, only root Db2 installations are queried.
• When the db2ls command is run with a non-root ID, root Db2 installations and the non-root
installation owned by matching non-root ID are queried. Db2 installations owned by other non-root IDs
are not queried.
The db2ls command is the only method to query a Db2 database product. You cannot query Db2
database products using Linux or UNIX operating system native utilities, such as pkginfo, rpm, SMIT, or
swlist. Any existing scripts containing a native installation utility that you use to query and interface
with Db2 installations must change.
You cannot use the db2ls command on Windows operating systems.
Procedure
• To list the path where Db2 database products are installed on your system and list the Db2 database
product level, enter:
The command lists the following information for each Db2 database product installed on your system:
– Installation path
– Level
– Fix pack
– Special Install Number. This column is used by IBM Db2 Support.
– Installation date. This column shows when the Db2 database product was last modified.
– Installer UID. This column shows the UID with which the Db2 database product was installed.
• To list information about Db2 database products or features in a particular installation path the q
parameter must be specified:
db2ls -q -p -b baseInstallDirectory
where:
– q specifies that you are querying a product or feature. This parameter is mandatory.
– p specifies that the listing displays products rather than listing the features.
– b specifies the installation directory of the product or feature. This parameter is mandatory if you
are not running the command from the installation directory.
Results
Depending on the parameters provided, the command lists the following information:
• Installation path. This is specified only once, not for each feature.
• The following information is displayed:
– Response file ID for the installed feature, or if the p option is specified, the response file ID for the
installed product. For example, ENTERPRISE_SERVER_EDITION.
– Feature name, or if the p option is specified, product name.
– Product version, release, modification level, fix pack level (VRMF). For example, 10.1.0.0
– Fix pack, if applicable. For example, if Fix Pack 1 is installed, the value displayed is 1. This includes
interim fix packs, such as Fix Pack 1a.
• If any of the product's VRMF information do not match, a warning message displays at the end of the
output listing. The message suggests the fix pack to apply.
Overview
The tool collects information without acquiring any latches or using any engine resources. It is therefore
possible (and expected) to retrieve information that is changing while db2pd is collecting information;
hence the data might not be completely accurate. If changing memory pointers are encountered, a signal
handler is used to prevent db2pd from ending abnormally. This can result in messages such as "Changing
data structure forced command termination" to appear in the output. Nonetheless, the tool can be helpful
for troubleshooting. Two benefits to collecting information without latching include faster retrieval and no
competition for engine resources.
If you want to capture information about the database management system when a specific SQLCODE,
ZRC code or ECF code occurs, this can be accomplished using the db2pdcfg -catch command. When
the errors are caught, the db2cos (callout script) is launched. The db2cos script can be dynamically
altered to run any db2pd command, operating system command, or any other command needed to
Examples
The following list is a collection of examples in which the db2pd command can be used to expedite
troubleshooting:
• Example 1: Diagnosing a lockwait
• Example 2: Using the -wlocks parameter to capture all the locks being waited on
• Example 3: Displaying the table name and schema name of locks
• Example 4: Using the -apinfo parameter to capture detailed runtime information about the lock owner
and the lock waiter
• Example 5: Using the callout scripts when considering a locking problem
• Example 6: Mapping an application to a dynamic SQL statement
• Example 7: Monitoring memory usage
• Example 8: Determine which application is using up your table space
• Example 9: Monitoring recovery
• Example 10: Determining the amount of resources a transaction is using
• Example 11: Monitoring log usage
• Example 12: Viewing the sysplex list
• Example 13: Generating stack traces
• Example 14: Viewing memory statistics for a database partition
• Example 15: Monitoring the progress of index reorganization
• Example 16: Displaying the top EDUs by processor time consumption and displaying EDU stack
information
• Example 17: Displaying agent event metrics
• Example 18: Displaying the extent movement status
The results text show in the examples is an extract of the db2cmd command ouput for better readability.
Example 1: Diagnosing a lockwait
If you run db2pd -db databasename -locks -transactions -applications -dynamic, the
results are similar to the following ones:
Locks:
TranHdl Lockname Type Mode Sts Owner Dur HldCnt Att ReleaseFlg
3 00020002000000040000000052 Row ..X G 3 1 0 0x0000 0x40000000
2 00020002000000040000000052 Row ..X W* 2 1 0 0x0000 0x40000000
For the database that you specified using the -db database name option, the first results show the locks
for that database. The results show that TranHdl 2 is waiting on a lock held by TranHdl 3.
Transactions:
AppHandl [nod-index] TranHdl Locks State Tflag Tflag2 ...
11 [000-00011] 2 4 READ 0x00000000 0x00000000 ...
12 [000-00012] 3 4 WRITE 0x00000000 0x00000000 ...
Applications:
AppHandl NumAgents CoorPid Status C-AnchID C-StmtUID L-AnchID L-StmtUID Appid
We can see that AppHandl 12 last ran dynamic statement 17, 1. AppHandl 11 is currently running
dynamic statement 17, 1 and last ran statement 94, 1.
;
We can see that the text column shows the SQL statements that are associated with the lock timeout.
Example 2: Using the -wlocks parameter to capture all the locks being waited on
If you run db2pd -wlocks -db pdtest, results similar to the following ones are generated. They show
that the first application (AppHandl 47) is performing an insert on a table and that the second application
(AppHandl 46) is performing a select on that table:
Locks:
Address TranHdl Lockname Type Mode Sts Owner Dur HoldCount Att
ReleaseFlg rrIID
0x00002AAAFFFA5F68 3 02000400000020000000000062 MdcBlockLock ..X G 3 1 0 0x00200000
0x40000000 0
0x00002AAAFFFA7198 3 41414141414A4863ADA1ED24C1 PlanLock ..S G 3 1 0 0x00000000
0x40000000 0
TableNm SchemaNm
T1 YUQZHANG 02000400000020000000000062 SQLP_MDCBLOCK (obj={2;4}, bid=d(0;32;0), x0000200000000000)
N/A N/A 41414141414A4863ADA1ED24C1 SQLP_PLAN ({41414141 63484A41 24EDA1AD}, loading=0)
You can also use the db2pd -wlocks detail command to display the table name, schema name, and
application node of locks that are being waited on as shown in the following output.
Database Member 0 -- Database PDTEST -- Active -- Up 0 days 00:00:35 -- Date 2012-11-06-11.11.32.403994
Application :
Address : 0x0780000001676480
AppHandl [nod-index] : 47 [000-00047]
Application PID : 876558
Application Node Name : boson
IP Address: n/a
Connection Start Time : (1197063450)Fri Dec 7 16:37:30 2007
Client User ID : venus
System Auth ID : VENUS
Coordinator EDU ID : 5160
Coordinator Partition : 0
Number of Agents : 1
Locks timeout value : 4294967294 seconds
Locks Escalation : No
Workload ID : 1
Workload Occurrence ID : 2
Trusted Context : n/a
Connection Trust Type : non trusted
Role Inherited : n/a
Application Status : UOW-Waiting
Application Name : db2bp
Application ID : *LOCAL.venus.071207213730
ClientUserID : n/a
ClientWrkstnName : n/a
ClientApplName : n/a
ClientAccntng : n/a
Application :
Address : 0x0780000000D77A60
AppHandl [nod-index] : 46 [000-00046]
Application PID : 881102
Application Node Name : boson
IP Address: n/a
Connection Start Time : (1197063418)Fri Dec 7 16:36:58 2007
Client User ID : venus
System Auth ID : VENUS
Coordinator EDU ID : 5913
Coordinator Partition : 0
Number of Agents : 1
Locks timeou t value : 4294967294 seconds
Locks Escalation : No
Workload ID : 1
Workload Occurrence ID : 1
Trusted Context : n/a
Connection Trust Type : non trusted
Role Inherited : n/a
Application Status : Lock-wait
Application Name : db2bp
Application ID : *LOCAL.venus.071207213658
ClientUserID : n/a
ClientWrkstnName : n/a
ClientApplName : n/a
In the output, W* indicates the lock that experienced the timeout. In this case, a lockwait has occurred. A
lock timeout can also occur when a lock is being converted to a higher mode. This is indicated by C* in the
output.
You can map the results to a transaction, an application, an agent, or even an SQL statement with the
output provided by other db2pd commands in the db2cos file. You can narrow down the output or use
other commands to collect the information that you need. For example, you can use the db2pd -locks
wait parameters to print only locks with a wait status. You can also use the -app and -agent
parameters.
Example 6: Mapping an application to a dynamic SQL statement
The command db2pd -applications -dynamic reports the current and last anchor ID and statement
unique ID for dynamic SQL statements. This allows direct mapping from an application to a dynamic SQL
statement.
Applications:
Address AppHandl [nod-index] NumAgents CoorPid Status
0x00000002006D2120 780 [000-00780] 1 10615 UOW-Executing
The final section of output sorts the consumers of memory for the entire memory set:
You can also report memory blocks for private memory on UNIX and Linux operating systems. For
example, if you run db2pd -memb pid=159770, results similar to the following ones are generated:
You can then obtain the information for table space 3 by using the db2pd -tablespaces command.
Sample output is as follows:
Tablespace 3 Configuration:
Type Content PageSz ExtentSz Auto Prefetch BufID FSC RSE NumCntrs MaxStripe LastConsecPg Name
DMS UsrTmp 4096 32 Yes 32 1 On Yes 1 0 31
TEMPSPACE2
Tablespace 3 Statistics:
TotalPgs UsablePgs UsedPgs PndFreePgs FreePgs HWM State MinRecTime NQuiescers
5000 4960 1088 0 3872 1088 0x00000000 0 0
Containers:
ContainNum Type TotalPgs UseablePgs StripeSet Container
0 File 5000 4960 0 /home/db2inst1/tempspace2a
The MinRecTime column returns a value that is a UNIX timestamp in a UTC timezone format. To convert
the value to a GMT time zone format you can use the Db2 timestamp function. For example, if
MinRecTime returns a value of 1369626329, to convert this value to a GMT format run the following
statement:
Dynamic Cache:
Current Memory Used 1022197
Total Heap Size 1271398
Cache Overflow Flag 0
Number of References 237
Number of Statement Inserts 32
Number of Statement Deletes 13
Number of Variation Inserts 21
Number of Statements 19
Finally, you can map the information from the preceding output to the applications output to identify the
application by running db2pd -db sample -app.
Applications:
AppHandl [nod-index] NumAgents CoorPid Status C-AnchID C-StmtUID
501 [000-00501] 1 11246 UOW-Waiting 0 0
You can use the anchor ID (AnchID) value that identified the dynamic SQL statement to identify the
associated application. The results show that the last anchor ID (L-AnchID) value is the same as the
anchor ID (AnchID) value. You use the results from one run of db2pd in the next run of db2pd.
The output from db2pd -agent shows the number of rows read (in the Rowsread column) and rows
written (in the Rowswrtn column) by the application. These values give you an idea of what the
application has completed and what the application still has to complete, as shown in the following
sample output:
You can map the values for AppHandl and AgentPid resulting from running the db2pd -agent
command to the corresponding values for AppHandl and CoorPiid resulting from running the db2pd -
app command.
The steps are slightly different if you suspect that an internal temporary table is filling up the table space.
You still use db2pd -tcbstats to identify tables with large numbers of inserts, however. Following is
sample information for an implicit temporary table:
Tablespace Configuration:
Id Type Content PageSz ExtentSz Auto Prefetch ... FSC RSE NumCntrs MaxStripe LastConsecPg Name
1 SMS SysTmp 4096 32 Yes 320 ... On Yes 10 0 31
TEMPSPACE1
Tablespace Statistics:
Id TotalPgs UsablePgs UsedPgs PndFreePgs FreePgs HWM State MinRecTime
NQuiescers
1 6516 6516 6516 0 0 0 0x00000000 0 0
Containers:
...
You can then identify application handles 30 and 31 (because you saw them in the -tcbstats output) by
using the command db2pd -app:
Applications:
AppHandl NumAgents CoorPid Status C-AnchID C-StmtUID L-AnchID L-StmtUID Appid
31 1 4784182 UOW-Waiting 0 0 107
1 ...4142
30 1 8966270 UOW-Executing 107 1 107
1 ...4013
Finally, map the information from the preceding output to the Dynamic SQL output obtained by running
the db2pd -dyn command:
Recovery:
Recovery Status 0x00000401
Current Log S0000005.LOG
Current LSN 0000001F07BC
Current LSO 000002551BEA
Job Type ROLLFORWARD RECOVERY
Job ID 7
Job Start Time (1107380474) Wed Feb 2 16:41:14 2005
Job Description Database Rollforward Recovery
Invoker Type User
Total Phases 2
Current Phase 1
Progress:
Address PhaseNum Description StartTime CompletedWork TotalWork
0x0000000200667160 1 Forward Wed Feb 2 16:41:14 2005 2268098 bytes Unknown
0x0000000200667258 2 Backward NotStarted 0 bytes Unknown
Transactions:
Address AppHandl [nod-index] TranHdl Locks State Tflag
0x000000022026D980 797 [000-00797] 2 108 WRITE 0x00000000
0x000000022026E600 806 [000-00806] 3 157 WRITE 0x00000000
0x000000022026F280 807 [000-00807] 4 90 WRITE 0x00000000
Logs:
Current Log Number 2
Pages Written 846
Method 1 Archive Status Success
Method 1 Next Log to Archive 2
Method 1 First Failure n/a
Method 2 Archive Status Success
Method 2 Next Log to Archive 2
Method 2 First Failure n/a
Sysplex List:
Alias: HOST
Location Name: HOST1
If the call stacks for all of the Db2 processes are desired, use the command db2pd -stack all, for
example (on Windows operating systems):
If you are using a partitioned database environment with multiple physical nodes, you can obtain the
information from all of the partitions by using the command db2_all "; db2pd -stack all". If the
partitions are all logical partitions on the same machine, however, a faster method is to use db2pd -
alldbp -stacks.
You can also redirect the output of the db2pdb -stacks command for db2sysc processes to a specific
directory path with the dumpdir parameter. The output can be redirected for a specific duration only with
the timeout parameter. For example, to redirect the output of stack traces for all EDUs in db2sysc
processes to /home/waleed/mydir for 30 seconds, issue the following command:
Controller Automatic: Y
Memory Limit: 122931408 KB
Current usage: 651008 KB
HWM usage: 651008 KB
Cached memory: 231296 KB
All registered "consumers" of instance memory within the Db2 server are listed with the amount of the
total instance memory they are consuming. The column descriptions are as follows:
Name
A short, distinguishing name of a consumer of instance memory, such as the following ones:
APPL-dbname
Application memory consumed for database dbname
DBMS-name
Global database manager memory requirements
FMP_RESOURCES
Memory required to communicate with db2fmps
PRIVATE
Miscellaneous private memory requirements
FCM_RESOURCES
Fast Communication Manager resources
LCL-pid
The memory segment used to communicate with local applications
DB-dbname
Database memory consumed for database dbname
Mem Used (KB)
The amount of instance memory that is currently allotted to the consumer
HWM Used (KB)
The high-water mark (HWM) or the peak instance memory, that the consumer has used
Cached (KB)
Of the Mem Used (KB), the amount of instance memory that may be reclaimed for this consumer.
Example 15: Monitoring the progress of index reorganization
In Db2 Version 9.8 Fix Pack 3 and later fix packs, the progress report of an index reorganization has the
following characteristics:
• The db2pd -reorgs index command reports index reorg progress for partitioned indexes (Fix Pack 1
introduced support for only non-partitioned indexes).
Example 16: Displaying the top EDUs by processor time consumption and displaying EDU stack
information
If you issue the db2pd command with the -edus parameter option, the output lists all engine
dispatchable units (EDUs). Output for EDUs can be returned at the level of granularity you specify, such as
EDU ID TID Kernel TID EDU Name USR SYS USR DELTA SYS DELTA
================================================================================================
6957 6957 13889683 db2agntdp (SAMPLE ) 0 58.238506 0.820466 1.160726 0.014721
6700 6700 11542589 db2agent (SAMPLE) 0 52.856696 0.754420 1.114821 0.015007
5675 5675 4559055 db2agntdp (SAMPLE ) 0 60.386779 0.854234 0.609233 0.014304
3088 3088 13951225 db2agntdp (SAMPLE ) 0 80.073489 2.249843 0.499766 0.006247
3615 3615 2887875 db2loggw (SAMPLE) 0 0.939891 0.410493 0.011694 0.004204
4900 4900 6344925 db2pfchr (SAMPLE) 0 1.748413 0.014378 0.014343 0.000103
7986 7986 13701145 db2agntdp (SAMPLE ) 0 1.410225 0.025900 0.003636 0.000074
2571 2571 8503329 db2ipccm 0 0.251349 0.083787 0.002551 0.000857
7729 7729 14168193 db2agntdp (SAMPLE ) 0 1.717323 0.029477 0.000998 0.000038
7472 7472 11853991 db2agnta (SAMPLE) 0 1.860115 0.032926 0.000860 0.000012
3358 3358 2347127 db2loggr (SAMPLE) 0 0.151042 0.184726 0.000387 0.000458
515 515 13820091 db2aiothr 0 0.405538 0.312007 0.000189 0.000178
7215 7215 2539753 db2agntdp (SAMPLE ) 0 1.165350 0.019466 0.000291 0.000008
6185 6185 2322517 db2wlmd (SAMPLE) 0 0.061674 0.034093 0.000169 0.000100
6442 6442 2756793 db2evmli (DB2DETAILDEADLOCK) 0 0.072142 0.052436 0.000092 0.000063
4129 4129 15900799 db2glock (SAMPLE) 0 0.013239 0.000741 0.000064 0.000001
2 2 11739383 db2alarm 0 0.036904 0.028367 0.000009 0.000009
4386 4386 13361367 db2dlock (SAMPLE) 0 0.015653 0.001281 0.000014 0.000003
1029 1029 15040579 db2fcms 0 0.041929 0.016598 0.000010 0.000004
5414 5414 14471309 db2pfchr (SAMPLE) 0 0.000093 0.000002 0.000000 0.000000
258 258 13656311 db2sysc 0 8.369967 0.263539 0.000000 0.000000
5157 5157 7934145 db2pfchr (SAMPLE) 0 0.027598 0.000177 0.000000 0.000000
1543 1543 2670647 db2fcmr 0 0.004191 0.000079 0.000000 0.000000
1286 1286 8417339 db2extev 0 0.000312 0.000043 0.000000 0.000000
2314 2314 14360813 db2licc 0 0.000371 0.000051 0.000000 0.000000
5928 5928 3137537 db2taskd (SAMPLE) 0 0.004903 0.000572 0.000000 0.000000
3872 3872 2310357 db2lfr (SAMPLE) 0 0.000126 0.000007 0.000000 0.000000
4643 4643 11694287 db2pclnr (SAMPLE) 0 0.000094 0.000002 0.000000 0.000000
1800 1800 5800175 db2extev 0 0.001212 0.002137 0.000000 0.000000
772 772 7925817 db2thcln 0 0.000429 0.000072 0.000000 0.000000
2057 2057 6868993 db2pdbc 0 0.002423 0.001603 0.000000 0.000000
2828 2828 10866809 db2resync 0 0.016764 0.003098 0.000000 0.000000
To provide information only about the EDUs that are the top consumers of processor time and to reduce
the amount of output returned, you can further include the top parameter option. In the following
example, only the top five EDUs are returned, across an interval of 5 seconds. Stack information is also
returned, and can be found stored separately in the directory path specified by DUMPDIR, which defaults
to diagpath.
EDU ID TID Kernel TID EDU Name USR SYS USR DELTA SYS
DELTA
=====================================================================================
=
3358 3358 2347127 db2loggr (SAMPLE) 0 0.154906 0.189223 0.001087 0.001363
3615 3615 2887875 db2loggw (SAMPLE) 0 0.962744 0.419617 0.001779 0.000481
$ ls -ltr
total 552
drwxrwxr-t 2 vbmithun build 256 05-31 09:59 events/
drwxrwxr-t 2 vbmithun build 256 06-04 03:17 stmmlog/
-rw-r--r-- 1 vbmithun build 46413 06-04 03:35 1249522.3358.000.stack.txt
-rw-r--r-- 1 vbmithun build 22819 06-04 03:35 1249522.3615.000.stack.txt
-rw-r--r-- 1 vbmithun build 20387 06-04 03:35 1249522.515.000.stack.txt
-rw-r--r-- 1 vbmithun build 50426 06-04 03:35 1249522.258.000.stack.txt
-rw-r--r-- 1 vbmithun build 314596 06-04 03:35 1249522.6700.000.stack.txt
-rw-r--r-- 1 vbmithun build 94913 06-04 03:35 1249522.000.processObj.txt
Agents:
Current agents: 12
Idle agents: 0
Active coord agents: 10
Active agents total: 10
Pooled coord agents: 2
Pooled agents total: 2
Extent Movement:
Address TbspName Current Last Moved Left TotalTime
0x00002AAB356D4BA0 DAVID 1168 1169 33 426 329636
The db2support tool collects most diagnostic data specific to Db2 pureScale components by default.
If you specify the -purescale, -cm, -cfs, or -udapl parameter, the db2support command
collects additional diagnostic data that is space intensive or takes a longer time to collect, but helps
determining the source of problem faster in Db2 pureScale environments.
The output is conveniently collected and stored in a compressed ZIP archive, db2support.zip, so
that it can be transferred and extracted easily on any system.
Results
The type of information that db2support captures depends on the way the command is invoked,
whether the database manager is started, and whether it is possible to connect to the database.
The db2support utility collects the following information under all conditions:
• db2diag log files
• All trap files
• Locklist files
• Dump files
• Various system-related files
• Output from various system commands
• db2cli.ini
• db2dsdriver.cfg
Depending on the circumstances, the db2support utility might also collect:
• Active log files
• Buffer pool and table space (SQLSPCS.1 and SQLSPCS.2) control files (with -d option)
• Contents of the db2dump directory
• Extended system information (with -s option)
• Database configuration settings (with -d option)
• Database manager configuration settings files
Extracting the db2support.zip file, the following files and directories were collected:
• DB2CONFIG/ - Configuration information (for example, database, database manager, BP, CLI, and Java
developer kit, among others)
• DB2DUMP/ - db2diag log file contents for the past three days
• DB2MISC/ - List of the sqllib directory
• DB2SNAP/ - Output of Db2 commands (for example,db2set, LIST TABLES, LIST INDOUBT
TRANSACTIONS, and LIST APPLICATIONS)
• PURESCALE/- Diagnostic information for Db2 pureScale components, such as cluster manager, cluster
file system and uDAPL
• db2supp_opt.zip - Diagnostic information for optimizer problems
• db2supp_system.zip - Operating system information
• db2support.html - Map to flat files collected in each subdirectory of the db2support.zip file listed in
HTML format and diagnostic information formatted into HTML sections
• db2support.log - Diagnostic log information for db2support collection
• db2support_options.in - Command-line options used to start the db2support collection
• db2support.map - Map to flat files collected in each subdirectory of the db2support.zip file listed in
plain text format
Information about Optimizer can be found in the db2supp_opt.zip file. Extraction of this file finds the
following directories:
• OPTIMIZER/ - Diagnostic information for optimizer problems
• OPTIMIZER/optimizer.log - File contains a log of all activities. If db2support prompts messages, for
example, DBT7116E, you might find more information within this log file.
• OPTIMIZER/CATALOGS - All the catalogs with LOBs in the following subdirectories (generated only if
the LOB column in the catalog table is not empty):
– FUNCTIONS
– INDEXES
– NODEGROUPS
– ROUTINES
– SEQUENCES
– TABLES
– VIEWS
• OPTIMIZER/DB2DUMP - db2serv output (serv.* and serv2.* output files)
System information can be found in the db2supp_system.zip file. Extraction of this file finds the
following file and directories:
• DB2CONFIG/ - db2cli.ini (files from ~/sqllib/cfg)
• DB2MISC/ - DB2SYSTM file (binary), among others
• OSCONFIG/ - Different operating system information files (for example, netstat, services, vfs,
ulimit, and hosts)
• OSSNAP/ - Operating system snapshots (for example, iostat, netstat, uptime, vmstat, and
ps_elf)
Example
For example, to validate all the instances for the Db2 copy, run the following command:
db2val -a
For complete db2val command details and further example, refer to the "db2val - Db2 copy validation
tool command" topic.
DB2 traces
C:\>db2trc
Usage: db2trc (chg|clr|dmp|flw|fmt|inf|off|on) options
For more information about a specific db2trc command parameter, use the -u option. For example, to
see more information about turning the trace on, execute the following command:
db2trc on -u
This will provide information about all of the additional options (labeled as "facilities") that can be
specified when turning on a Db2 trace.
When turning trace on, the most important option is -L. This specifies the size of the memory buffer that
will be used to store the information being traced. The buffer size can be specified in either bytes or
megabytes. (To specify megabytes append either "M" or "m" after the value). The trace buffer size must be
a power of two megabytes. If you specify a size that does not meet this requirement, the buffer size will
automatically be rounded down to the nearest power of two.
If the buffer is too small, information might be lost. By default only the most recent trace information is
kept if the buffer becomes full. If the buffer is too large, it might be difficult to send the file to the IBM
Software Support team.
If tracing an operation that is relatively short (such as a database connection), a size of approximately 8
MB is usually sufficient:
However, if you are tracing a larger operation or if a lot of work is going on at the same time, a larger trace
buffer might be required.
On most platforms, tracing can be turned on at any time and works as described previously. However,
there are certain situations to be aware of:
1. On multiple database partition systems, you must run a trace for each physical (as opposed to logical)
database partition.
2. On HP-UX and Linux platforms, if the trace is turned off after the instance has been started, a very
small buffer will be used the next time the trace is started regardless of the size specified. For
example, yesterday you turned trace on by using db2trc on -l 8m, then collected a trace, and then
turned the trace off (db2trc off). Today you want to run a trace with the memory buffer set for 32
megabytes (db2trc on -l 32m) without bringing the instance down and restarting. You will find that
in this case trace will only get a small buffer. To effectively run a trace on these platforms, turn the
trace on before starting the instance with the size buffer you need and "clear" the buffer as necessary
afterwards.
To reduce the amount of data collected or formatted, the db2trc command supports several mask
options. Reducing the amount of data collected is useful, because it can reduce the additional processor
usage incurred due to an ongoing trace collection, and because you can collect data more selectively.
Collecting data more selectively can also help speed up problem diagnosis.
You typically use the -m mask option under the guidance of IBM support. However, you can use the -p
mask option to collect a trace only for specific process IDs (and optionally thread IDs). For example, to
enable tracing for process 77 with threads 1, 2, 3, and 4, and for process 88 with threads 5, 6, 7, and 8
the syntax is:
db2trc on -p 77.1.2.3.4,88.5.6.7.8
To modify the Kernel shared memory, create a sysctl.conf in /etc directory that uses super user
privilege.
sudo vi /etc/sysctl.conf
A sample sysctl.conf file with maximum shared memory setting to support 64 MB would look like:
kern.sysv.shmmax: 67108864
kern.sysv.shmmin: 1
kern.sysv.shmmni: 512
kern.sysv.shmseg: 128
kern.sysv.shmall: 16384
machdep.pmap.hashmax: 14
security.mac.posixshm_enforce: 1
security.mac.sysvshm_enforce: 1
Once the changes are done, restart the system for the changes to take effect.
db2trcon
-duration 45 -top 5 -interval 15 -flw -fmt
When db2trc is turned off after the specified duration, db2trcon automatically generates the dump, flow
and format files for you.
The db2trcoff turns off tracing and can generate dump, flow and format files automatically with a single
command. For example, to turn db2trc off with -force and generate flow format and dump files, issue the
following command:
db2trcoff -flw
-fmt -force
Note that if you turned tracing on with the db2trcon script and specified a duration, you do not need to
issue the db2troff command separately.
C:\>db2trc clr
Trace has been cleared
Once the operation being traced has finished, use the dmp option followed by a trace file name to dump
the memory buffer to disk. For example:
The trace facility will continue to run after dumping the trace buffer to disk. To turn tracing off, use the
OFF option:
C:\>db2trc off
Trace is turned off
If the output shows that the value of Trace wrapped is YES, the trace buffer was not large enough to
contain all the information that was collected during the trace period. A wrapped trace might be
acceptable, depending on the situation. If you are interested in the most recent information (the
information that is maintained unless you specified the -i option), what is in the trace file might be
sufficient. However, if you are interested in what happened at the beginning of the trace period or if you
are interested in everything that occurred, you might want to redo the operation with a larger trace buffer.
There are options for formatting a binary file. For example, you can issue db2trc format -xml
trace.dmp trace.fmt to convert the binary data into an XML format that can be parsed. You can also
use the formattedFlow parameter of the db2trc command to parse the binary file into a formatted text
file that is organized in chronological order. You can also create a performance report from a dump file by
using the perfrep parameter. Additional options are shown in the description of the db2trc command.
On Linux and UNIX operating systems, if a severe error occurs, the Db2 software automatically dumps the
trace buffer to disk when it shuts down the instance due to a severe error. If tracing is enabled when an
instance ends abnormally, a file is created in the diagnostic directory. The file name is db2trdmp.nnn,
where nnn is the database partition number. The file creation in the diagnostic directory does not occur
on Windows operating systems when an instance ends abnormally due to an error. You must dump the
trace manually.
To summarize, the following example shows the common sequence of db2trc commands:
db2trc on -l 8M
db2trc clr
<Execute problem recreation commands>
db2trc dump db2trc.dmp
db2trc off
db2trc flw db2trc.dmp <filename>.flw
db2trc fmt db2trc.dmp <filename>.fmt
db2trc fmt -c db2trc.dmp <filename>.fmtc
Trace utility
The db2drdat utility records the data that is interchanged between the Db2 Connect server (on behalf of
the IBM data server client) and the IBM mainframe database server.
As a database administrator (or application developer), you might find it useful to understand how this
flow of data works, because this knowledge can help you determine the origin of a particular problem.
Suppose you found yourself in the following situation: you issue a CONNECT TO database statement for a
IBM mainframe database server but the command fails and you receive an unsuccessful return code. If
you understand exactly what information was conveyed to the IBM mainframe database server
management system, you might be able to determine the cause of the failure even if the return code
information is general. Many failures are caused by simple user errors.
Output from db2drdat lists the data streams exchanged between the Db2 Connect workstation and the
IBM mainframe database server management system. Data sent to the IBM mainframe database server is
labeled SEND BUFFER and data received from the IBM mainframe database server is labeled RECEIVE
BUFFER.
If a receive buffer contains SQLCA information, it will be followed by a formatted interpretation of this
data and labeled SQLCA. The SQLCODE field of an SQLCA is the unmapped value as returned by the IBM
mainframe database server. The send and receive buffers are arranged from the oldest to the most recent
within the file. Each buffer has:
• The process ID
• A SEND BUFFER, RECEIVE BUFFER, or SQLCA label. The first DDM command or object in a buffer is
labeled DSS TYPE.
The remaining data in send and receive buffers is divided into five columns, consisting of:
• A byte count.
• Columns 2 and 3 represent the DRDA data stream exchanged between the two systems, in ASCII or
EBCDIC.
• An ASCII representation of columns 2 and 3.
• An EBCDIC representation of columns 2 and 3.
Trace output
The trace file that is written by the db2drdat utility contains operational information about DRDA.
The db2drdat utility writes the following information to tracefile:
• -r
– Type of DRDA reply/object
– Receive buffer
• -s
– Type of DRDA request
– Send buffer
• -c
– SQLCA
• TCP/IP error information
– Receive function return code
– Severity
– Protocol used
– API used
SEND BUFFER(AR):
RECEIVE BUFFER(AR):
SEND BUFFER(AR):
RECEIVE BUFFER(AR):
SEND BUFFER(AR):
RECEIVE BUFFER(AR):
SEND BUFFER(AR):
RECEIVE BUFFER(AR):
SEND BUFFER(AR):
RECEIVE BUFFER(AR):
Obtaining traces of applications that use the IBM Data Server Driver for JDBC and SQLJ
You can enable IBM Data Server Driver for JDBC and SQLJ tracing using Connection or DataSource
properties, driver global configuration properties, or DB2TraceManager methods.
CLI traces
The CLI trace contains a record of all the CLI function calls that the CLI driver made.
The CLI trace is an essential tool for diagnosing problems with applications that access the CLI driver. The
CLI trace provides diagnostic information when a problem is encountered in any of the following places:
• A CLI application
• An ODBC application, because ODBC applications use the CLI interface to access IBM database servers
• A CLI stored procedure
By default, the trace utility is disabled. When enabled, the trace utility generates one or more trace files
whenever an application accesses the CLI driver. These trace files provide the following information:
• The order in which the application called the CLI functions
• The contents of input and output parameters that were passed to and received from the CLI functions
• The return codes and any error or warning messages that the CLI functions generated
CLI trace file analysis provides a number of benefits. First, subtle program logic and parameter
initialization errors are often evident in the traces. Second, CLI traces might suggest ways of better tuning
an application or the databases that it accesses. For example, if a CLI trace shows that a particular set of
columns is queried many times, you might want to create an index corresponding to one of the columns
to improve application performance. Finally, analysis of CLI trace files can help you understand how a
third-party application or interface is behaving.
Procedure
To obtain a CLI trace:
1. Update the db2cli.ini file with CLI configuration keywords.
You can update the db2cli.ini file by either manually editing the db2cli.ini file or by issuing the
UPDATE CLI CFG command.
• To manually edit the db2cli.ini file:
a. Locate the db2cli.ini file. For more location of the db2cli.ini file, see Call Level Interface
Guide and Reference Volume 1.
b. Open the db2cli.ini file in a plain text editor.
c. Add the following section to the db2cli.ini file or, if the [COMMON] section exists, append the
CLI trace keywords in the following example:
[COMMON]
Trace=1
TracePathName=path
TraceComm=1
TraceFlush=1
TraceTimeStamp=1
If you use the TracePathName keyword, ensure that the path that you specify exists and that it
has global read and write permission.
Note:
– Because CLI trace keywords are in the [COMMON] section of the db2cli.ini file, their
values apply to all database connections that are made through the CLI driver.
– CLI trace keywords are not case-sensitive. However, path and file name keyword values
might be case-sensitive on some operating systems, such as UNIX operating systems.
If you use the TracePathName keyword, ensure that the path that you specify exists and that it
has global read and write permission.
b. Verify the CLI trace keywords in the db2cli.ini configuration file by issuing the following
command:
Note:
– The IBM Data Server Driver Package and IBM Data Server Driver for ODBC and CLI installations
do not contain the Db2 command line processor. To change the settings of trace configuration
keywords, you can modify the db2cli.ini file manually.
2. To enable the CLI trace, restart the application. If you are tracing a CLI stored procedure, restart the
Db2 instance
The db2cli.ini file is read only on application initialization, unless the TraceRefreshInterval
keyword is set.
3. Capture the error:
a. Run the application until the error is generated. To reduce the trace size, if possible, run only the
application that is required to replicate the problem.
b. Terminate the application.
4. Disable the CLI trace setting in one of the following ways:
• Set the Trace keyword to a value of 0 in the [COMMON] section of the db2cli.ini file.
• Issue the following command:
Results
The CLI trace files are written to the path that you specified for the TracePathName keyword. The file
names have a format of ppidttid.cli. The pid value is the process ID that the operating system
assigns, and the tid value is a numeric counter (starting at 0) for each thread that is generated by the
application process. An example of the file name is p1234t1.cli.
SQLRETURN SQLConnect (
SQLHDBC ConnectionHandle, /* hdbc */
SQLCHAR *FAR ServerName, /* szDSN */
SQLSMALLINT NameLength1, /* cbDSN */
SQLCHAR *FAR UserName, /* szUID */
SQLSMALLINT NameLength2, /* cbUID */
SQLCHAR *FAR Authentication, /* szAuthStr */
SQLSMALLINT NameLength3); /* cbAuthStr */
The initial call to the CLI function shows the input parameters and the values being assigned to them (as
appropriate).
When CLI functions return, they show the resultant output parameters, for example:
SQLAllocStmt( phStmt=1:1 )
<--- SQL_SUCCESS Time elapsed - +4.444000E-003 seconds
In this case, the CLI function SQLAllocStmt() is returning an output parameter phStmt with a value of
"1:1" (connection handle 1, statement handle 1).
The following trace entry shows the binding of the parameter marker as a CHAR with a maximum length
of 7:
SQLExecute( hStmt=1:1 )
---> Time elapsed - +1.317000E-003 seconds
( iPar=1, fCType=SQL_C_CHAR, rgbValue="000010" - X"303030303130",
pcbValue=6, piIndicatorPtr=6 )
sqlccsend( ulBytes - 384 )
sqlccsend( Handle - 14437216 )
sqlccsend( ) - rc - 0, time elapsed - +1.915000E-003
sqlccrecv( )
sqlccrecv( ulBytes - 1053 ) - rc - 0, time elapsed - +8.808000E-003
SQLExecute( )
<--- SQL_SUCCESS Time elapsed - +2.213300E-002 seconds
(This time value indicates the time spent in the application since last CLI API was called)
SQLAllocStmt( phStmt=1:1 )
<--- SQL_SUCCESS Time elapsed - +4.444000E-003 seconds
(Since the function has completed, this time value indicates the time spent in Db2, including the network
time)
The other way to capture timing information is using the CLI keyword: TraceTimeStamp. This keyword will
generate a timestamp for every invocation and result of a CLI API call. The keyword has 4 display options:
no timestamp information, processor ticks and ISO timestamp, processor ticks, or ISO timestamp.
This can be very useful when working with timing related problems such as CLI0125E - function sequence
errors. It can also be helpful when attempting to determine which event happened first when working
with multithreaded applications.
SQLExecDirect( )
<--- SQL_SUCCESS_WITH_INFO Time elapsed - +1.06E+001 seconds
In this CLI trace, the keyset parser has indicated a return code of 1100, which indicates that there is not a
unique index or primary key for the table, and therefore a keyset cursor cannot be created. These return
You can also trace a multithreaded application to one file, using the CLI keyword: TraceFileName. This
method will generate one file of your choice, but can be cumbersome to read, as certain API's in one
thread can be executed at the same time as another API in another thread, which could potentially cause
some confusion when reviewing the trace.
It is usually recommended to turn TraceTimeStamp on so that you can determine the true sequence of
events by looking at the time that a certain API was executed. This can be very useful for investigating
problems where one thread caused a problem in another thread (for example, CLI0125E - Function
sequence error).
Note: Trace examples used in this section have line numbers preceding them. These line numbers have
been added to aid the discussion and will not appear in an actual CLI trace.
Immediately following the trace header, there are usually a number of trace entries related to
environment and connection handle allocation and initialization. For example:
10 SQLAllocEnv( phEnv=&bffff684 )
11 ---> Time elapsed - +9.200000E-004 seconds
12 SQLAllocEnv( phEnv=0:1 )
13 <--- SQL_SUCCESS Time elapsed - +7.500000E-004 seconds
16 SQLAllocConnect( phDbc=0:1 )
17 <--- SQL_SUCCESS Time elapsed - +5.280000E-004 seconds
20 SQLSetConnectOption( )
21 <--- SQL_SUCCESS Time elapsed - +3.150000E-004 seconds
25 SQLConnect( )
26 <--- SQL_SUCCESS Time elapsed - +5.209880E-001 seconds
27 ( DSN=""SAMPLE"" )
28 ( UID=" " )
29 ( PWD="*" )
This would mean the application and the database server were using the same code page ( 819 ).
The return trace entry of the SQLConnect() function also contains important connection information
(lines 27-29 in the trace example). Additional information that might be displayed in the return entry
includes any PATCH1 or PATCH2 keyword values that apply to the connection. For example, if
PATCH2=27,28 was specified in the db2cli.ini file under the COMMON section, the following line
should also appear in the SQLConnect() return entry:
( PATCH2="27,28" )
Following the environment and connection related trace entries are the statement related trace entries.
For example:
32 SQLAllocStmt( phStmt=1:1 )
33 <--- SQL_SUCCESS Time elapsed - +6.890000E-004 seconds
37 SQLExecDirect( )
38 <--- SQL_SUCCESS Time elapsed - +2.387800E-002 seconds
In the trace example, the database connection handle ( phDbc=0:1 ) was used to allocate a statement
handle ( phStmt=1:1 ) at line 32. An unprepared SQL statement was then executed on that statement
handle at line 34. If the TraceComm=1 keyword had been set in the db2cli.ini file, the
SQLExecDirect() function call trace entries would have shown additional client-server communication
information as follows:
SQLExecDirect( )
<--- SQL_SUCCESS Time elapsed - +2.384900E-002 seconds
41 SQLAllocStmt( phStmt=1:2 )
42 <--- SQL_SUCCESS Time elapsed - +6.820000E-004 seconds
46 SQLPrepare( )
47 <--- SQL_SUCCESS Time elapsed - +9.150000E-004 seconds
50 SQLBindParameter( )
51 <--- SQL_SUCCESS Time elapsed - +6.780000E-004 seconds
52 SQLExecute( hStmt=1:2 )
53 ---> Time elapsed - +1.337000E-003 seconds
54 ( iPar=1, fCType=SQL_C_CHAR, rgbValue="Hello World!!!", pcbValue=14,
piIndicatorPtr=14 )
55 SQLExecute( )
56 <--- SQL_ERROR Time elapsed - +5.951000E-003 seconds
In the trace example, the database connection handle ( phDbc=0:1 ) was used to allocate a second
statement handle ( phStmt=1:2 ) at line 41. An SQL statement with one parameter marker was then
prepared on that statement handle at line 43. Next, an input parameter ( iPar=1 ) of the appropriate SQL
type ( SQL_CHAR ) was bound to the parameter marker at line 48. Finally, the statement was executed at
line 52. Notice that both the contents and length of the input parameter ( rgbValue="Hello World!!!",
pcbValue=14 ) are displayed in the trace on line 54.
The SQLExecute() function fails at line 52. If the application calls a diagnostic CLI function like
SQLError() to diagnose the cause of the failure, then that cause will appear in the trace. For example:
The error message returned at line 59 contains the Db2 native error code that was generated
( SQL0302N ), the sqlstate that corresponds to that code ( SQLSTATE=22001 ) and a brief description of
the error. In this example, the source of the error is evident: on line 52, the application is trying to insert a
string with 14 characters into a column defined as VARCHAR(10) on line 34.
If the application does not respond to a CLI function warning or error return code by calling a diagnostic
function like SQLError(), the warning or error message should still be written to the CLI trace. However,
SQLDisconnect( hDbc=0:1 )
---> Time elapsed - +1.501000E-003 seconds
sqlccsend( ulBytes - 72 )
sqlccsend( Handle - 1084869448 )
sqlccsend( ) - rc - 0, time elapsed - +1.080000E-004
sqlccrecv( )
sqlccrecv( ulBytes - 27 ) - rc - 0, time elapsed - +1.717950E-001
( Unretrieved error message="SQL0302N The value of a host variable in the
EXECUTE or OPEN statement is too large for its corresponding use.
SQLSTATE=22001" )
SQLDisconnect( )
<--- SQL_SUCCESS Time elapsed - +1.734130E-001 seconds
The final part of a CLI trace should show the application releasing the database connection and
environment handles that it allocated earlier in the trace. For example:
61 SQLTransact( )
<--- SQL_SUCCESS Time elapsed - +2.220750E-001 seconds
62 SQLDisconnect( hDbc=0:1 )
63 ---> Time elapsed - +1.511000E-003 seconds
64 SQLDisconnect( )
65 <--- SQL_SUCCESS Time elapsed - +1.531340E-001 seconds
66 SQLFreeConnect( hDbc=0:1 )
67 ---> Time elapsed - +2.389000E-003 seconds
68 SQLFreeConnect( )
69 <--- SQL_SUCCESS Time elapsed - +3.140000E-004 seconds
70 SQLFreeEnv( hEnv=0:1 )
71 ---> Time elapsed - +1.129000E-003 seconds
72 SQLFreeEnv( )
73 <--- SQL_SUCCESS Time elapsed - +2.870000E-004 seconds
Platform-specific tools
Diagnostic tools (Windows)
Three useful diagnostic tools on Windows systems are described.
The following diagnostic tools are available for Windows operating systems:
Event viewer, performance monitor, and other administrative tools
The Administrative Tools folder provides a variety of diagnostic information, including access to the
event log and access to performance information.
Task Manager
The Task Manager shows all of the processes running on the Windows server, along with details about
memory usage. Use this tool to find out which Db2 processes are running, and to diagnose
performance problems. Using this tool, you can determine memory usage, memory limits, swapper
space used, and memory leakage for a process.
To open the Task Manager, press Ctrl + Alt + Delete, and click Task Manager from the available
options.
Dr. Watson™
The Dr. Watson utility is invoked in the event of a General Protection Fault (GPF). It logs data that
might help in diagnosing a problem, and saves this information to a file. You must start this utility by
typing drwatson on the command line.
lsattr -l sys0 -E
xmperf
For AIX systems using Motif, this command starts a graphical monitor that collects and displays
system-related performance data. The monitor displays three-dimensional diagrams for each
database partition in a single window, and is good for high-level monitoring. However, if activity is low,
the output from this monitor is of limited value.
spmon
If you are using system partitioning as part of the Parallel System Support Program (PSSP), you might
need to check if the SP Switch is running on all workstations. To view the status of all database
partitions, use one of the following commands from the control workstation:
• spmon -d for ASCII output
• spmon -g for a graphical user interface
Alternatively, use the command netstat -i from a database partition workstation to see if the
switch is down. If the switch is down, there is an asterisk (*) beside the database partition name. For
example:
Procedure
• To collect the base set of diagnostic information in a compressed file archive, enter the db2support
command:
Using -s will give system details about the hardware used and the operating system. Using -d will give
details about the specified database. Using -c allows for an attempt to connect to the specified
database.
The output is conveniently collected and stored in a compressed ZIP archive, db2support.zip, so
that it can be transferred and extracted easily on any system.
What to do next
For specific symptoms, or for problems in a specific part of the product, you might have to collect
additional data. Refer to the problem-specific "Collecting data" documents.
You can do any of the following tasks next:
• Analyze the data
• Submit the data to IBM Software Support
db2setup -t trace.out
dascrt -u DASUSER -d
dasdrop -d
dasmigr -d
dasupdt -d
db2icrt -d INSTNAME
db2idrop INSTNAME -d
db2iupgrade -d INSTNAME
db2iupdt -d INSTNAME
2. Locate the diagnostic files. More than one file might be present, so compare the timestamps to ensure
that you are obtaining all of the appropriate files.
The output will be found in the /tmp directory by default.
Example file names are: dascrt.log, dasdrop.log , dasupdt.log , db2icrt.log.PID,
db2idrop.log.PID, db2iupgrade.log.PID, and db2iupdt.log.PID, where PID is the process
ID.
3. Provide the diagnostic file(s) to IBM Software Support.
If the problem is that the db2start or START DATABASE MANAGER command is failing, look for a file
named db2start.timestamp.log in the insthome/sqllib/log directory, where insthome is the
home directory for the instance owner. Likewise if the problem is that the db2stop or STOP DATABASE
MANAGER command is failing, look for a file named db2stop.timestamp.log. These files will only be
found if the database manager did not respond to the command within the amount of time specified in the
start_stop_time database manager configuration parameter.
Symptoms
Depending on the specific problem that exists, you might observe some of the following symptoms on
your data server:
db2fodc -connections
db2fodc -cpu
The db2fodc -cpu command collects processor-related diagnostic data and places it in a
FODC_Cpu_timestamp_member directory, where timestamp is the time when the db2fodc -
connections command was executed and member is the member or members the collection was
performed for. As an alternative when the problem is intermittent or if you want to configure your
The -detect parameter with the threshold rules specified delays the collection of processor-related
information until the trigger conditions specified by the threshold rule are detected. You can specify
your own rules for the -detect parameter to determine when to start diagnostic data collection. In the
previous example, conditions for user processor usage and the run queue must both be met three times
over the course of three iterations. This means that the trigger conditions must exist for 6 seconds total
to trigger diagnostic data collection on all members (a trigger count of 3 x 2 second intervals = a trigger
condition that must exist for 6 seconds). The iteration option specifies that trigger condition
detection followed by diagnostic data collection is performed tree times, with a sleep time of 500
seconds between each iteration.
• To collect diagnostic data for problems related to memory usage when you are already observing
related problem symptoms, you can issue the following command:
db2fodc -memory
The db2fodc -memory command collects memory-related diagnostic data and places it in the
FODC_Memory_timestamp_member, where timestamp is the time when the db2fodc -
connections command was executed and member is the member or members the collection was
performed for. As an alternative when the problem is intermittent or if you want to configure your
system ahead of time to collect diagnostic data when a specific problem condition exists, you can issue
a variation of the following command:
The -detect parameter with the rules specified delays collection until the rules are detected. In this
example, the trigger condition for free memory and the number of connections to the database must
exist for 40 seconds to trigger diagnostic data collection on member 3 (trigger count of 4 x 10 second
intervals = 40 seconds total). Ten iterations of detection and diagnostic data collection can be
performed, enabled over a duration of 5 hours.
• To only detect trigger conditions but not to perform diagnostic data collection, you can use the
db2fodc -detect command together with the nocollect option. An entry is logged in the db2diag
log files anytime the problem condition specified is detected as a threshold hit. If you choose not to
collect diagnostic data, then, for detailed threshold hits, the db2diag log entry must be used to
determine if a problem condition that you created a threshold rule for was detected.
In this example, the problem condition specified is detected as a threshold hit four times on member 0.
Both the values you specified for each threshold rule and the actual values detected are logged.
Related information
Best practices: Troubleshooting Db2 servers
Procedure
To analyze diagnostic data, take the following actions:
• Have a clear understanding of how the various pieces of data relate to each other.
For example, if the data spans more than one system, keep your data well organized so that you know
which pieces of data come from which sources.
• Confirm that each piece of diagnostic data is relevant to the timing of the problem by checking
timestamps.
Note that data from different sources can have different timestamp formats; be sure to understand the
sequence of the different elements in each timestamp format so that you can tell when the different
events occurred.
• Determine which data sources are most likely to contain information about the problem, and start your
analysis there.
For example, if the problem is related to installation, start your analysis with the installation log files (if
any), rather than starting with the general product or operating system log files.
• The specific method of analysis is unique to each data source, but one tip that is applicable to most
traces and log files is to start by identifying the point in the data where the problem occurs. After you
identify that point, you can work backward in time through the data to unravel the root cause of the
problem.
• If you are investigating a problem for which you have comparative data from an environment that is
working and one that is not, start by comparing the operating system and product configuration details
for each environment.
2. [Optional] To terminate any applications that did not COMMIT or ROLLBACK during the timeout
period in Step 1 and any new applications which accessed the database after the timeout period
completed, issue the following command:
db2stop force
4. Restart the Db2 instance using either one of the following commands:
db2start
or
Diagnosis
Locate the FODC directory that is specified under the diagpath database manager configuration
parameter. The location of the FODC directory can also be confirmed by viewing the administration
notification or db2diag log files. Forward the FODC information to IBM Software Support.
Symptoms
When issuing multiple load operations you might notice that a load operation is taking longer to complete
than normal or appears to be hanging.
Utilities:
Address ID Type State Invoker Priority StartTime DBName NumPhases CurPhase Description
0x000000020120E2A0 1 LOAD 0 0 0 Fri May 13 12:44:34 SAMPLE 2 2 [LOADID: 16.2011-05-13-12.44.34.638811.0 (3;5)]...
Progress:
Address ID PhaseNum CompletedWork TotalWork StartTime Description
0x000000020120E600 1 1 0 bytes 0 bytes Fri May 13 12:44:34 SETUP
0x000000020120E7E0 1 2 0 rows 0 rows Fri May 13 12:44:34 LOAD
2. Issue the db2diag -g 'dataobj:=loadID' command, where loadID is the ID of the load operation
found in the previous step. This command displays all the diagnostic log messages from the db2diag
command log related to the specified load operation. The following example shows what is displayed
when this command is issued with the load operation ID identified previously:
AUTHID : VIVMAK
After you complete these steps, you have enough information to identify the problematic load operation.
However, if you need more information about the load operation, issue the db2pd -db <dbname> -load
loadID="LOADID" stacks command to obtain a stack trace. The stacks option is available on UNIX
and Linux operating systems only. The following example shows what is displayed when this command is
issued on a sample database with the load operation ID identified previously:
$ db2pd -db sample -load loadID="LOADID: 16.2011-05-13-12.34.34.638811.0 (3;5)" stacks
Database Partition 0 -- Database SAMPLE -- Active -- Up 0 days 00:00:27 -- Date 05/13/2011 14:28:32
Node Number : 0
The db2pd -db <dbname> -load loadID="LOADID" stacks command displays all the EDU
information related to the load operation specified and produces stack trace files in the diagpath
directory.
You can use all the information retrieved to perform further troubleshooting techniques, such as
monitoring or terminating the load operation. Also, the information gathered might be requested by IBM
technical support to troubleshoot problematic load operations.
You can also use the collected information to run the db2trc command for further troubleshooting. To
run the db2trc command for a specific load operation by using the retrieved information:
1. Run the db2pd -load command to retrieve the application ID of the specific load operation that you
are interested in.
2. Run the db2trc -appid or db2trc -apphdl command to record further information about the load
operation.
In the following example, the application ID *LOCAL.vivmak.110513164421 of the load ID LOADID:
16.2011-05-13-12.44.34.638811.0 (3;5) from the previous example in this topic, is used to run
the db2trc command:
Trace is turned on
$ db2trc info
Platform : Linux/X
allocationCount : 2
DB2TRCD pid : 0
numSuspended : 0
Mask : *.*.*.*.*
Timestamps : disabled
In the next example, the application handle obtained from the output of the db2pd -load command is
used to change the trace options that are in effect for the db2trc command:
$ db2trc info
Marker : @TRACE@
Platform : Linux/X
allocationCount : 2
DB2TRCD pid : 0
numSuspended : 0
Mask : *.*.*.*.*
Timestamps : disabled
Procedure
1. If your task does not execute as expected, the first thing you should do is look for a execution status
record in the ADMIN_TASK_STATUS administrative view.
• If there is a record, examine the various values. In particular, pay attention to the STATUS,
INVOCATION, SQLCODE, SQLSTATE, SQLERRMC and RC columns. The values often identify the root
cause of the problem.
• If there is no execution status record in the view, the task did not execute. There are a number of
possible explanations for this:
– The administrative task scheduler is disabled. Tasks will not execute if the administrative task
scheduler is disabled. To enable the scheduler, set the DB2_ATS_ENABLE registry variable.
– The task was removed. Someone may have removed the task. Confirm the task exists by
querying the ADMIN_TASK_LIST administrative view.
– The scheduler is unaware of the task. The administrative task scheduler looks for new and
updated tasks by connecting to each active database every five minutes. Until this period has
elapsed, the scheduler is unaware of your task. Wait at least five minutes.
– The database is inactive. The administrative task scheduler cannot retrieve or execute tasks
unless the database is active. Activate the database.
– The transaction is uncommitted. The administrative task scheduler ignores uncommitted tasks.
Be sure to commit after adding, updating, or removing a task.
– The schedule is invalid. The task's schedule might prevent the task from running. For example,
the task may have already reached the maximum number of invocations. Review the task's
schedule in the ADMIN_TASK_LIST view and update the schedule if necessary.
2. If you cannot determine the cause of the problem by referring to the ADMIN_TASK_STATUS
administrative view, refer to the Db2 diagnostic log.
All critical errors are logged to the db2diag log file. Informational event messages are also logged by
the administrative task scheduler daemon during task execution. These errors and messages are by
identified by the "Administrative Task Scheduler" component.
What to do next
Symptoms
When you attempt certain database operations such as dropping a database, terminating a database
connection, and creating a backup copy of a database, the operation fails and an error stating that the
database is currently in use might be returned.
Causes
This SQL1035N error message might be returned in one of the following scenarios:
1. There are open connections to the database preventing the attempted operation from succeeding.
This can occur in the following situations:
• Exclusive use was requested, but the database is already in use as a shared database by another
user (in the same process).
• Exclusive use was requested, but the database is already in use as an exclusive database. This
means that two different processes are trying to access the same database.
• The maximum number of connections to the database has been reached.
• The database is being used by another user on another system.
2. The database has been activated explicitly, preventing the operation from succeeding.
3. The database is active because it is in the WRITE SUSPEND state.
Symptoms
Potential disk space savings by enabling row compression on temporary tables are not being realized as
expected.
Causes
• This situation occurs mostly as a result of a large number of applications running at the same time and
creating temporary tables, each of which consumes a portion of the database manager memory. This
results in not enough memory being available to create the compression dictionary. Notification is not
given when this situation occurs.
• Rows are compressed using a dictionary-based approach according to an algorithm. If a row of a
temporary table is large enough to yield appreciable savings in disk space, the row will be compressed.
Small rows in temporary tables will not be compressed and this will account for the lack of expected
savings in disk storage space. Notification is not given when this situation occurs.
Risk
There is no risk to the system aside from row compression not being used on temporary tables with row
sizes that do not meet the threshold. There could be other adverse effects to the database manager if
available memory remains highly constricted.
Symptoms
It is possible that the log reader may encounter transient and permanent errors while reading log records
that contain compressed user data. Here are non-exhaustive example lists of the two classes of errors
that may be encountered while reading log records with compressed data (row images).
Transient errors:
• Table space access not allowed
• Unable to access the table (lock timeout)
• Out of memory (to load and store the required dictionary)
Permanent errors:
• Table space in which the table resides does not exist
• Table or table partition to which the log record belongs does not exist
• A dictionary does not exist for the table or table partition
• The log record contains row images compressed with a dictionary older than the dictionaries in the
table
Causes
It is possible that a replication solution, or any other log reader, may fall behind database activities and
receive an error reading a log record which contains compressed user data (see Scenario 1). Such a case
Since a data compression dictionary already exists for table t6, the two INSERTs after the ALTER will be
compressed (using Table t6's compression dictionary). At this point, the log reader has not yet reached
the first INSERT statement.
The following REORG TABLE command causes a new compression dictionary to be built for table t6, and
the current compression dictionary is kept as the historical dictionary, thus making the log reader one
dictionary behind the current compression dictionary (however, the historical dictionary is not loaded into
memory after the REORG):
As the log reader is reading the INSERT log for the INSERT statements, which now requires the historical
dictionary to be read in memory, the table t6 is undergoing a LOAD operation:
-> db2 load from data.del of del insert into table t6 allow no access
When the LOAD is executed on the source table, table t6 will be Z-locked due to the specified ALLOW NO
ACCESS option. The log reader must load the historical dictionary into memory to decompress row images
found in the INSERT log records, however, fetching the dictionary requires an IN table lock. In this case,
the log reader will fail to acquire the lock. This results in the sqlcode member of the
db2ReadLogFilterData structure to return SQL code SQL2048N. This corresponds to a transient error
(that is, the log record might be decompressed if the API is called again). The log reader will return the
compressed row image in the log record and continue on reading the next log record.
Scenario 2:
Table t7 has the DATA CAPTURE CHANGES attribute enabled. Compression is enabled for the table in
order to reduce storage costs. The table is being replicated by a data replication application, however, the
log reader has fallen behind on the source table activity and the data compression dictionary has already
been rebuilt twice before the log reader reads from the log records again.
The following statements are executed against Table t7, with the DATA CAPTURE CHANGES attribute
already enabled, table compression is enabled, and a new dictionary is built:
The db2ReadLog API will not be able to decompress the contents of the log record in this case, because
the log reader has fallen behind two or more REORG RESETDICTIONARY operations. The dictionary
required to decompress the row image in the log record would not be found in the table; only the
compression dictionary of the second REORG and the compression dictionary of the last REORG is stored
with the table. However, the db2ReadLog API would not fail with an error. Instead, the uncompressed
row image will be returned in the user buffer, and, in the db2ReadLogFilterData structure preceding the
log record, the sqlcode member will return SQL code SQL0204N. This code corresponds to a permanent
error (that is, the log record cannot ever be decompressed).
Environment
This failure to successfully decompress a compressed log record, due to a missing old compression
dictionary, can occur on any platform on which a data replication solution uses the db2ReadLog API and
the DATA CAPTURE CHANGES attribute is set for the table.
For transient errors, it may be possible to reissue the read request and successfully read the log. For
example, if the log record belongs to a table residing in a table space and access to the table is not
allowed, the dictionary may not be accessible to decompress the log record (see Scenario 1). The table
space may become available at a later time, and reissuing the log read request at that time may
successfully decompress the log record.
• If a transient error is returned (see Scenario 1), read the error information in order to take appropriate
action. This may include waiting for the table operation to complete, which could allow a re-read of the
log record and decompression to be successful.
• If a permanent error occurs (see Scenario 2), the row image in the log record cannot be decompressed
since the compression dictionary, which was used to compress the row image, is no longer available.
For this case, replication solutions may need to re-initialize the affected (target) table.
Scenario 1
References to global variables must be properly qualified. It is possible that there exists a variable with
the same name and a different schema where the incorrect schema is encountered earlier in the PATH
register value. One solution is to ensure that the references to the global variable are fully qualified.
########################################################################
# developerUser connects to database and creates needed objects
########################################################################
########################################################################
# secadmUser grants setsessionuser
########################################################################
db2 "connect to sample user secadmUser using xxxxxxxx"
db2 "grant setsessionuser on user finalUser to user developerUser"
db2 "terminate"
########################################################################
# developerUser will debug the problem now
########################################################################
echo "------------------------------------------------------------"
echo " Connect as developerUser "
echo "------------------------------------------------------------"
db2 "connect to sample user developerUser using xxxxxxxx"
echo "------------------------------------------------------------"
echo " SET SESSION AUTHORIZATION = finalUser "
echo "------------------------------------------------------------"
db2 "set session authorization = finalUser"
echo "------------------------------------------------------------"
echo " SET SESSION AUTHORIZATION = developerUser "
echo "------------------------------------------------------------"
db2 "terminate"
Troubleshooting inconsistencies
Troubleshooting data inconsistencies
Diagnosing where data inconsistencies exist within the database is very important. One way to determine
data inconsistencies is to use the output from the INSPECT command to identify where a problem exists.
When inconsistencies are found, you will have to decide how to deal with the problem.
Once you have determined that there is a data consistency problem, you have two options:
• Contact IBM Software Support and ask for their assistance in recovering from the data inconsistency
• Drop and rebuild the database object that has the data consistency problem.
You will use the INSPECT CHECK variation from the INSPECT command to check the database, table
space, or table that has evidence of a data inconsistency. Once the results of the INSPECT CHECK
command are produced, you should format the inspection results using the db2inspf command.
If the INSPECT command does not finish, then contact IBM Software Support.
Locking implications
While checking for index to data inconsistencies by using the INSPECT command with the INDEXDATA
option, the inspected tables are only locked in IS mode.
When the INDEXDATA option is specified, by default only the values of explicitly specified level clause
options are used. For any level clause options which are not explicitly specified, the default levels (INDEX
NORMAL and DATA NORMAL) are overwritten from NORMAL to NONE.
Causes
The possible causes of this are:
• The database is offline as a result of an abnormal termination of the previous session (for example, a
power failure).
• If the error was encountered when issuing the db2ckupgrade command:
– The database is online and SQL has been issued which modified data in the database.
– The database is online and HADR has been enabled.
• In Db2 pureScale environments only, possible causes also include:
– The database on this Db2 member is offline as a result of an abnormal termination of the previous
session.
– The database is offline across the entire Db2 pureScale instance as a result of an abnormal
termination of the previous session.
– If drop operations are done in the instance, recoverable databases are already put into backup
pending state. Drop operations are not allowed until backup of database is done.
– An attempt was made to modify the cluster topology (for example, adding or deleting a member)
while the database was in one of the following states: backup pending, restore pending, rollforward
pending.
Procedure
To troubleshoot installation problems for Db2 database systems:
• Ensure that your system meets all of the installation requirements.
• If you are encountering licensing errors, ensure that you have applied the appropriate licenses.
Review the list of frequently asked questions in the "Knowledge Collection: Db2 license issues"
technote: http://www.ibm.com/support/docview.wss?rs=71&uid=swg21322757
• Review the list of installation issues in the documentation and on the Db2 Technical Support website:
www.ibm.com/software/data/db2/support/db2_9/troubleshoot.html
What to do next
If you complete these steps but cannot yet identify the source of the problem, begin collecting diagnostic
data to obtain more information.
db2setup -t /filepath/trace.out
setup -t \filepath\trace.out
Procedure
1. Ensure that you are looking at the appropriate installation log file. Check the file's creation date, or the
timestamp included in the file name (on Windows operating systems).
2. Determine whether the installation completed successfully.
• On Windows operating systems, success is indicated by a message similar to the following at the
bottom of the installation log file:
• On Linux and UNIX operating systems, success is indicated by a message at the bottom of the
installation log file (the one named db2setup.log by default).
3. OPTIONAL: Determine whether any errors occurred. If the installation completed successfully, but you
received an error message during the installation process, locate these errors in the installation log
file.
• On Windows operating systems, most errors will be prefaced with "ERROR:" or "WARNING:". For
example:
• On Linux and UNIX operating systems, a file with a default name of db2setup.err will be present if
any errors were returned by Java (for example, exceptions and trap information).
If you had enabled an installation trace, there will be more entries in the installation log files and the
entries will be more detailed.
Results
If analyzing this data does not help you to resolve your problem, and if you have a maintenance contract
with IBM Software Support, you can open a problem report. IBM Software Support will ask you to submit
any data that you have collected, and they might also ask you about any analysis that you performed.
Symptoms
While creating or upgrading an instance, you might receive an error when trying to update the DBM CFG as
part of instance creation. The error code that you might receive is DBI1281E. However, this error might
not give the root cause of the problem and diagnostic information is need to further troubleshoot the
instance creation problem.
Errors when installing a Db2 database product as a non-root user to the default path on a system WPAR
(AIX)
Various errors can occur if you install Db2 database products as a non-root user in the default installation
path (/opt/IBM/db2/V9.7) on a system workload partition (WPAR) on AIX 6.1. To avoid these
problems, install Db2 database products on a file system that is accessible only to the WPAR.
Symptoms
If you install Db2 database products in the /usr or /opt directories on a system WPAR, various errors
can occur depending on how you configured the directories. System WPARs can be configured to either
share the /usr and /opt directories with the global environment (in which case the /usr and /opt
directories will be readable but not write accessible from the WPAR) or to have a local copy of the /usr
and /opt directories.
In the first scenario, if a Db2 database product is installed to the default path on the global environment,
that installation will be visible in the system WPAR. This will give the appearance that Db2 is installed on
the WPAR, however attempts to create a Db2 instance will result in this error: DBI1288E The
execution of the program db2icrt failed. This program failed because you do not
have write permission on the directory or file /opt/IBM/db2/V9.7/
profiles.reg,/opt/IBM/db2/V9.7/default.env.
In the second scenario, if a Db2 database product is installed to the default path on the global
environment then when the WPAR creates the local copy of the /usr and /opt directories the Db2
database product installation will also be copied. This can cause unexpected problems if a system
administrator attempts to use the database system. Since the Db2 database product was intended for
another system, inaccurate information might be copied over. For example, any Db2 instances originally
created on the global environment will appear to be present in the WPAR. This can cause confusion for
the system administrator with respect to which instances are actually installed on the system.
Causes
These problems are caused by installing Db2 database products in /usr or /opt directories on a system
WPAR.
Resolving service name errors when you install Db2 database products
If you choose a non-default service name or port number for the Db2 database product to use, ensure
that you do not specify values that are already in use.
Symptoms
When you attempt to install a Db2 database product, the Db2 Setup wizard reports an error that states
"The service name specified is in use".
Causes
The Db2 Setup wizard will prompt you to choose port numbers and service names when you install:
• A Db2 database product that will accept TCP/IP communications from clients
• A Db2 database product that will act as a database partition server
This error can occur if you choose a service name and port number rather than accepting the default
values. If you choose a service name that already exists in the services file on the system and you only
change the port number, this error will occur.
The following steps assume that you have used the db2licm command to generate a Db2 license
compliance report.
Procedure
1. Open the file that contains the Db2 license compliance report.
– Check whether there are any multidimensional cluster tables. Run the following command
against every database in every instance in the Db2 copy:
– Check whether any of your instances use query parallelism (also known as interquery
parallelism). Run the following command once in each instance in the Db2 copy:
– Check if connection concentrator is enabled. Run the following command against every instance
in the Db2 copy:
This command displays the current values of database manager configuration parameters
including MAX_CONNECTIONS and MAX_COORDAGENTS. If the value of the
MAX_CONNECTIONS is greater than the value of the MAX_COORDAGENTS then connection
concentrator is enabled. If you are not using Db2 Enterprise Server Edition, Db2 Advanced
Enterprise Server Edition, or Db2 Connect Server products, ensure that connection concentrator
– Check if any indexes have compression enabled. Run the following command against every
database in every instance in the Db2 copy:
– Check if any compression dictionary still exists for a table that has row level compression
deactivated. Run the following command against every database in every instance in the Db2
copy:
Note: This query might be resource intensive and might take a long time to run. Run this query
only if Storage Optimization license violations are being reported even though there are no tables
that have row level compression enabled.
Introduction
A locking problem is the proper diagnosis if you are experiencing a failure of applications to complete
their tasks or a slow down in the performance of SQL queries due to locks. Therefore, the ideal objective
is not to have any lock timeouts or deadlocks on a database system, both of which result in applications
failing to complete their tasks.
Lock waits are normal expected events, but if the time spent waiting for a lock becomes large, then lock
waits can slow down both SQL query performance and completion of an application. Excessive lock wait
durations have a risk of becoming lock timeouts which result in the application not completing its tasks.
Lock escalations are a consideration as a locking problem when they contribute to causing lock timeouts.
Ideally, the objective is not to have any lock escalations, but a small number can be acceptable if adverse
effects are not occurring.
It is suggested that you monitor lock wait, lock timeout, and deadlock locking events at all times; typically
at the workload level for lock waits, and at the database level for lock timeouts and deadlocks.
The diagnosis of the type of locking problem that is occurring and its resolution begins with the collection
of information and looking for diagnostic indicators. The following sections help to guide you through this
process.
Collect information
In general, to be able to objectively assess that your system is demonstrating abnormal behavior which
can include processing delays and poor performance, you must have information that describes the
typical behavior (baseline) of your system. A comparison can then be made between your observations of
In general, to be able to objectively assess that your system is demonstrating abnormal behavior which
can include processing delays and poor performance, you must have information that describes the
typical behavior (baseline) of your system. A comparison can then be made between your observations of
suspected abnormal behavior and the baseline. Collecting baseline data, by scheduling periodic
operational monitoring tasks, is a key component of the troubleshooting process. For more detailed
information about establishing the baseline operation of your system, see: ../../
com.ibm.db2.luw.admin.perf.doc/doc/c0054690.dita.
What to do next
After having diagnosed that lock waits are likely causing the problem you are experiencing, take steps to
resolve the issue: “Resolving lock wait problems” on page 537
Confirm that you are experiencing a lock wait problem by taking the necessary diagnostic steps for
locking problems outlined in “Diagnosing and resolving locking problems” on page 534.
The guidelines provided here can help you to resolve the lock wait problem you are experiencing and help
you to prevent such future incidents.
Procedure
Use the following steps to diagnose the cause of the unacceptable lock wait problem and to apply a
remedy:
1. Obtain information from the administration notification log about all tables where agents are spending
long periods of time waiting for locks.
2. Use the information in the administration notification log to decide how to resolve the lock wait
problem. There are a number of guidelines that help to reduce lock contention and lock wait time.
Consider the following options:
• If possible, avoid very long transactions and WITH HOLD cursors. The longer locks are held, the
more chance that they cause contention with other applications. This is only an issue if you are using
a high isolation level.
• It is best practice to commit the following actions as soon as possible:
– Write actions such as delete, insert, and update
– Data definition language (DDL) statements, for example ALTER, CREATE, and DROP statements
– BIND and REBIND commands
• After issuing ALTER or DROP DDL statements, run the SYSPROC.ADMIN_REVALIDATE_DB_OBJECTS
procedure to revalidate any data objects and the db2rbind command to rebind any packages.
• Avoid fetching result sets that are larger than necessary, especially under the repeatable read (RR)
isolation level. The more that rows are touched, the more locks are held, and the greater the
opportunity to run into a lock that is held by someone else. In practical terms, this often means
pushing down row selection criteria into a WHERE clause of the SELECT statement, rather than
bringing back more rows and filtering them at the application. For example:
==>
• Avoid using higher isolation levels than necessary. Repeatable read might be necessary to preserve
result set integrity in your application; however, it does incur extra cost in terms of locks held and
potential lock conflicts.
• If appropriate for the business logic in the application, consider modifying locking behavior through
the DB2_EVALUNCOMMITTED, DB2_SKIPDELETED, and DB2_SKIPINSERTED registry variables.
These registry variables enable Db2 database manager to delay or avoid taking locks in some
circumstances, thereby reducing contention and potentially improving throughput.
• Eliminate lock escalations wherever possible.
What to do next
Rerun the application or applications to ensure that the locking problem has been eliminated by checking
the administration notification log for lock-related entries or checking the lock wait and lock wait time
metrics for the appropriate workload, connection, service subclass, unit of work, and activity levels.
In general, any observed deadlock is considered abnormal. To be able to objectively assess that your
system is demonstrating abnormal behavior which can include processing delays and poor performance,
you must have information that describes the typical behavior (baseline) of your system. A comparison
can then be made between your observations of suspected abnormal behavior and the baseline.
Collecting baseline data, by scheduling periodic operational monitoring tasks, is a key component of the
troubleshooting process. .
For instructions about how to monitor deadlock locking events, see: "Monitoring locking events" in
Database Monitoring Guide and Reference.
Diagnosis
A deadlock is created when two applications lock data that is needed by the other, resulting in a
situation in which neither application can continue executing without the intervention of the deadlock
detector. The victim application has to re-execute the transaction from the beginning after the system
automatically rolls back the previous deadlocked transaction. Monitoring the rate at which this
happens helps avoid the case where many deadlocks drive significant extra load on the system
without the DBA being aware.
Indicative signs
Look for the following indicative signs of deadlocks:
• One or more applications are occasionally re-executing transactions
• Deadlock message entries in the administration notification log
• Increased number of deadlocks displayed for the deadlocks monitor element
What to do next
After having diagnosed that deadlocks are likely causing the problem you are experiencing, take steps to
resolve the issue: “Resolving deadlock problems” on page 539
Confirm that you are experiencing a deadlock problem by taking the necessary diagnostic steps for
locking problems outlined in “Diagnosing and resolving locking problems” on page 534.
The guidelines provided here can help you to resolve the deadlock problem you are experiencing and help
you to prevent such future incidents.
Procedure
Use the following steps to diagnose the cause of the unacceptable deadlock problem and to apply a
remedy:
What to do next
Rerun the application or applications to ensure that the locking problem has been eliminated by checking
the administration notification log for lock-related entries.
In general, to be able to objectively assess that your system is demonstrating abnormal behavior which
can include processing delays and poor performance, you must have information that describes the
typical behavior (baseline) of your system. A comparison can then be made between your observations of
suspected abnormal behavior and the baseline. Collecting baseline data, by scheduling periodic
operational monitoring tasks, is a key component of the troubleshooting process. .
Diagnosis
Sometimes, lock wait situations lead to lock timeouts that cause transactions to be rolled back. The
period of time until a lock wait leads to a lock timeout is specified by the database configuration
parameter locktimeout. Lock timeouts, in excessive numbers, can be as disruptive to a system as
deadlocks. Although deadlocks are comparatively rare in most production systems, lock timeouts can
be more common. The application usually has to handle them in a similar way: re-executing the
transaction from the beginning. Monitoring the rate at which this happens helps avoid the case where
many lock timeouts drive significant extra load on the system without the DBA being aware.
Indicative signs
Look for the following indicative signs of lock timeouts:
• An application is frequently re-executing transactions
• lock_timeouts monitor element value is climbing
• Lock timeout message entries in the administration notification log
What to monitor
Due to the relatively transient nature of locking events, lock event data is most valuable if
collected periodically over a period of time, so that the evolving picture can be better understood.
You can monitor the administration notification log for lock timeout messages.
Note: To enable lock timeout messages to be written to the administration notification log file, set
the mon_lck_msg_lvl database configuration parameter to a value of 3.
Create an event monitor to capture lock timeout data for a workload or database.
These are the key indicator monitoring elements:
• lock_timeouts value is climbing
• int_rollbacks value is climbing
If you have observed one or more of the indicative signs listed here, then you are likely experiencing a
problem with lock timeouts. Follow the link in the "What to do next" section to resolve this issue.
What to do next
After having diagnosed that lock timeouts are likely causing the problem you are experiencing, take steps
to resolve the issue: “Resolving lock timeout problems” on page 541
Confirm that you are experiencing a lock timeout problem by taking the necessary diagnostic steps for
locking problems outlined in “Diagnosing and resolving locking problems” on page 534.
The guidelines provided here can help you to resolve the lock timeout problem you are experiencing and
help you to prevent such future incidents.
==>
• Avoid using higher isolation levels than necessary. Repeatable read might be necessary to preserve
result set integrity in your application; however, it does incur extra cost in terms of locks held and
potential lock conflicts.
• If appropriate for the business logic in the application, consider modifying locking behavior through
the DB2_EVALUNCOMMITTED, DB2_SKIPDELETED, and DB2_SKIPINSERTED registry variables.
These registry variables enable Db2 database manager to delay or avoid taking locks in some
circumstances, thereby reducing contention and potentially improving throughput.
What to do next
Rerun the application or applications to ensure that the locking problem has been eliminated by checking
the administration notification log for lock-related entries or checking the lock wait and lock wait time
metrics for the appropriate workload, connection, service subclass, unit of work, and activity levels.
In general, to be able to objectively assess that your system is demonstrating abnormal behavior which
can include processing delays and poor performance, you must have information that describes the
typical behavior (baseline) of your system. A comparison can then be made between your observations of
suspected abnormal behavior and the baseline. Collecting baseline data, by scheduling periodic
operational monitoring tasks, is a key component of the troubleshooting process. For more detailed
information about establishing the baseline operation of your system, see: ../../
com.ibm.db2.luw.admin.perf.doc/doc/c0054690.dita.
Diagnosis
Lock escalation from multiple row-level locks to a single table-level lock can occur for the following
reasons:
• The total amount of memory consumed by many row-level locks held against a table exceeds the
percentage of total memory allocated for storing locks
• The lock list runs out of space. The application that caused the lock list to be exhausted will have its
locks forced through the lock escalation process, even though the application is not the holder of
the most locks.
The threshold percentage of total memory allocated for storing locks, that has to be exceeded by an
application for a lock escalation to occur, is defined by the maxlocks database configuration
parameter and the allocated memory for locks is defined by the locklist database configuration
parameter. In a well-configured database, lock escalation is rare. If lock escalation reduces
concurrency to an unacceptable level, you can analyze the problem and decide on the best course of
action.
Lock escalation is less of an issue, from the memory space perspective, if self tuning memory
manager (STMM) is managing the memory for locks that is otherwise only allocated by the locklist
database configuration parameter. STMM will automatically adjust the memory space for locks if it
ever runs out of free memory space.
Indicative signs
Look for the following indicative signs of lock escalations:
• Lock escalation message entries in the administration notification log
What to monitor
Due to the relatively transient nature of locking events, lock event data is most valuable if
collected periodically over a period of time, so that the evolving picture can be better understood.
Check this monitoring element for indications that lock escalations might be a contributing factor
in the SQL query performance slow down:
• lock_escals
If you have observed one or more of the indicative signs listed here, then you are likely experiencing a
problem with lock escalations. Follow the link in the "What to do next" section to resolve this issue.
What to do next
After having diagnosed that lock escalations are likely causing the problem you are experiencing, take
steps to resolve the issue: “Resolving lock escalation problems” on page 544
Confirm that you are experiencing a lock escalation problem by taking the necessary diagnostic steps for
locking problems outlined in “Diagnosing and resolving locking problems” on page 534.
The guidelines provided here can help you to resolve the lock escalation problem you are experiencing
and help you to prevent such future incidents.
The objective is to minimize lock escalations, or eliminate them, if possible. A combination of good
application design and database configuration for lock handling can minimize or eliminate lock
escalations. Lock escalations can lead to reduced concurrency and potential lock timeouts, so addressing
lock escalations is an important task. The lock_escals monitor element and messages written to the
administration notification log can be used to identify and correct lock escalations.
First, ensure that lock escalation information is being recorded. Set the value of the mon_lck_msg_lvl
database configuration parameter to 1. This is the default setting. When a lock escalation event occurs,
information regarding the lock, workload, application, table, and error SQLCODEs are recorded. The query
is also logged if it is a currently executing dynamic SQL statement.
Procedure
Use the following steps to diagnose the cause of the unacceptable lock escalation problem and to apply a
remedy:
1. Gather information from the administration notification log about all tables whose locks have been
escalated and the applications involved.
This log file includes the following information:
• The number of locks currently held
• The number of locks needed before lock escalation is completed
• The table identifier and table name of each table being escalated
• The number of non-table locks currently held
• The new table-level lock to be acquired as part of the escalation. Usually, an S or X lock is acquired.
• The internal return code that is associated with the acquisition of the new table-level lock
2. Use the administration notification log information about the applications involved in the lock
escalations to decide how to resolve the escalation problems.
Consider the following options:
• You can enable the DB2_AVOID_LOCK_ESCALATION registry variable to return SQL0912N to the
application, instead of performing lock escalation. The application then has an opportunity to either
COMMIT or ROLLBACK which will release the locks held by this application.
• Check and possibly adjust either the maxlocks or locklist database configuration parameters, or
both. In a partitioned database system, make this change on all database partitions. The value of the
locklist configuration parameter may be too small for your current workload. If multiple
applications are experiencing lock escalation, this could be an indication that the lock list size needs
to be increased. Growth in workloads or the addition of new applications could cause the lock list to
be too small. If only one application is experiencing lock escalations, then adjusting the maxlocks
configuration parameter could resolve this. However, you may want to consider increasing
locklist at the same time you increase maxlocks - if one application is allowed to use more of
Each idle connection caches about 23 LRB, or locking memory of about 3000 bytes. This means that
there are a large number of connections, in some extreme situations, without lowering the
maxlocks parameter or increasing the locklist parameter. This could cause you to reach the
database limit on locks (locklist) and as a result cause pervasive lock escalation problems.
What to do next
Rerun the application or applications to ensure that the locking problem has been eliminated by checking
the administration notification log for lock-related entries.
Symptoms
An SQL query runs reasonably well on its own, but then slows down in a production system when other
queries are running at the same time. This may occur every day at a specific time, or seemingly at
random.
Looking at the SQL statement there are no apparent reasons for the intermittent performance problem.
Causes
Often these types of performance issues are a result of sort heap memory allocation problems.
Performance overview
Performance refers to the way that a computer system behaves in response to a particular workload.
Performance is measured in terms of system response time, throughput, and resource utilization.
Performance is also affected by:
• The resources that are available on the system
This DDL is included in the EXPLAIN.DDL file located in the misc subdirectory of the sqllib directory.
Symptoms
Various error messages might occur, depending on the circumstances. For example, the following error
can occur when you create a database: SQL1229N The current transaction has been rolled back because
of a system error. SQLSTATE=40504
Causes
The problem is caused by the presence of an entry for the IP address 127.0.0.2 in the /etc/hosts file,
where 127.0.0.2 maps to the fully qualified hostname of the machine. For example:
Environment
The problem is limited to partitioned database environments.
Symptoms
If you attempt to create a database on an encrypted file system in a multiple partition database
environment, you will receive the following error: SQL10004C An I/O error occurred while
accessing the database directory. SQLSTATE=58031
Causes
At this time it is not possible to create a partitioned database environment using EFS (encrypted file
systems) on AIX. Since partitioned database partitions use rsh or ssh, the keystore in EFS is lost and
database partitions are unable to access the database files that are stored on the encrypted file system.
Symptoms
If table states cause the redistribution to fail, the error message indicates that the database partition
group cannot be redistributed or that the operation is not allowed. For example, SQL02436N, SQL6056N,
and SQL0668N messages can be symptoms of this problem.
Note: If the error message lists a table name, it might not be the only problematic table in the database
partition group. By troubleshooting the table states for all of the tables in the database partition group,
you can avoid multiple unsuccessful redistribution attempts.
SELECT TABNAME
FROM SYSCAT.TABLES AS TABLES, SYSCAT.TABLESPACES AS TABLESPACES
WHERE TABLES.TBSPACE = TABLESPACES.TBSPACE AND TABLES.STATUS = 'X'
AND TABLESPACES.DBPGNAME = 'IBMDEFAULTGROUP'
SELECT TABNAME
FROM SYSCAT.TABLES AS TABLES, SYSCAT.TABLESPACES AS TABLESPACES
WHERE TABLES.TBSPACE = TABLESPACES.TBSPACE AND TABLES.STATUS = 'C'
AND TABLESPACES.DBPGNAME = 'IBMDEFAULTGROUP'
where IBMDEFAULTGROUP is the database partition group name. If this query takes a long time to
execute, terminate the query, issue the RUNSTATS command on all of the tables involved in this query,
and then reissue the query.
Troubleshooting scripts
You may have internal tools or scripts that are based on the processes running in the database engine.
These tools or scripts may no longer work because all agents, prefetchers, and page cleaners are now
considered threads in a single, multi-threaded process.
Your internal tools and scripts will have to be modified to account for a threaded process. For example,
you may have scripts that start the ps command to list the process names; and then perform tasks
against certain agent processes. Your scripts must be rewritten.
The problem determination database command db2pd will have a new option -edu (short for "engine
dispatchable unit") to list all agent names along with their thread IDs. The db2pd -stack command
continues to work with the threaded engine to dump individual EDU stacks or to dump all EDU stacks for
the current node.
Recompile the static section to collect section actuals after applying Fix Pack 1
After applying DB2® Version 9.7 Fix Pack 1, section actuals cannot be collected for a static section
compiled prior to applying the fix pack. The static section must be recompiled to collect section actuals
after applying Fix Pack 1.
Symptoms
Section actuals are not collected when the EXPLAIN_FROM_ACTIVITY routine is executed.
• The value given for the registry variable is not valid. Refer to the registry variable usage for
DB2_MEMORY_PROTECT for information about valid values.
• The hardware and operating system may not support storage protection keys and the feature cannot be
enabled.
Next steps
The result of stepping through the available diagnostic information will determine what troubleshooting
scenarios you should look at next. Once you have narrowed down the scope of the problem, you can
navigate through the Db2 pureScale troubleshooting documentation to find the context that most likely
applies to your problem definition. Often, you will find two types of troubleshooting content, very context-
specific answers to frequently asked troubleshooting questions (FAQs), and more comprehensive
troubleshooting scenarios that show you how to interpret the diagnostic data and resolve the problem
For example, if the diagnostic information you looked at shows that a member has an alert and is waiting
to fail back to its home host after having failed over to a guest host for recovery purposes (an operation
known as a restart light), you can locate the associated failure scenario in the troubleshooting
documentation by looking at the Db2 pureScale instance operation scenarios, and then at the subsection
for a members or host with alert. Not all possible failure scenarios are covered, but many are.
DB2CF_<instanceName>_MGMT <mgmnt_port>/tcp
DB2CF_<instanceName> <port>/tcp
The <instance_name> is the name of the Db2 pureScale instance. The <port> is the port value that is used
for the uDAPL connection. The <mgmnt_port> is the port value that is used for the TCP/IP management
port connectivity.
Note: During instance creation, unused port values are detected and required entries are created in
the /etc/services file to reserve the port values for the Db2 instance.
– On Linux:
– On Linux:
export DISPLAY=IP_Address:0.0
where IP_Address represents the IP address of the X Window client machine you are using to launch
the installation.
Procedure
To gather diagnostic data:
1. Use the information in the DBI2047E error message to determine the host or hosts where the failure
occurred.
2. Issue the db2support command in one of the following ways:
• To collect diagnostic data on the local host, issue the following command:
db2support -install
where hostname is the name of the remote host for which you want to collect diagnostic data. For
example, to collect diagnostic data on the host hotellnx96, issue the following command:
where hostname_list is a comma-separated list of hosts for which you want to collect diagnostic
data. For example, to collect diagnostic data on the hosts hotellnx96, hotellnx97, and hotellnx98,
issue the following command:
Results
The diagnostic data is collected in the db2support.zip file. The file is created in the current directory, if
the current directory is writeable; otherwise, the file is placed in your home directory.
Example
The following example shows the typical output of the db2support -install command. In this case,
diagnostic data is collected on the local host.
What to do next
If you are working with IBM Support to troubleshoot a Db2 installation or instance-creation problem, you
might be given instructions on how to upload the db2support.zip file for analysis by support
personnel.
Symptoms
The initial symptom is the following error that is returned while in the step of creating the instance as part
of the install.
Diagnosis / resolution
• Check the /tmp/db2diag.log for messages similar to the following
– Line # : 6884---2610-403 The resource is stale. or
– Line # : 9578---2610-422 Cannot execute the command on node <hostname>. The
resource manager IBM.RecoveryRM is not available.
Note: If you see these errors, this indicates that the IBM Tivoli System Automation for Multiplatforms
(SA MP) recovery resource manager daemon experienced a problem. The daemon serves as the
decision engine for Tivoli SA MP and is identified as IBM.RecoveryRM in the system. Diagnostic data
will be written by Tivoli SA MP to diagnose the problem.
• Tivoli SA MP diagnostic data is written into the directories /var/ct/db2domain/log/mc/ (error logs)
and /var/ct/db2domain/run/mc/ (core dumps) and /tmp/db2_cluster_manager_spooling/
(default trace directory).
Scenarios
Commands from Tivoli SA MP and RSCT which show information can be safely run, however, it is not
recommended to run commands that change a state or cause actions to occur in Tivoli SA MP or RSCT.
Db2 cluster services controls these resources via the policies setup, and so external changes may result
in unexpected actions to occur.
Note: Only the most important mappings are shown in this topic.
The Db2 cluster services tooling commands that show this information are:
• db2cluster -list -alert
• db2instance -list
When running the lssam command and the following scenario is encountered:
• A resource is failed offline
– A failed offline resource may not indicate a problem, based on what is specified in the output. To see
if this maps to a problem, run the following commands:
– 1. db2instance -list
2. db2cluster -list -alert
– Run the db2cluster -list -alert command and follow the instructions returned by that
command. If you are unable to resolve the issue, follow the instructions that follow:
- If a db2start command was issued, see “CF server failure” on page 580.
- If a db2start command was not issued, contact IBM Service.
– Failed offline states may be cleared via the db2cluster -clear -alert command if the alert
appears in the db2cluster -list -alert command output. It is not recommended to clear these
states via Tivoli SA MP or RSCT commands.
• A resource is pending online
– Wait for the resource to come online if a db2start command was issued, or if the resource is
undergoing a restart operation.
- If the state later moves to failed offline see A resource is failed offline .
– Run db2instance -list to see whether or not the instance has been started. Pending Online
states may appear in lssam output when the instance is stopped, and is waiting for db2start to be
issued by the user.
• A resource is offline
Case 1: Messages exist in the install log or the db2diag log file
Symptoms
The initial symptom in this case is a hang during the installation process. More specifically, the hang
occurs during the process of creating the instance.
Diagnosis and resolution:
• Check the /tmp/db2setup.log and/or the /tmp/db2icrt.log. For this example, the following
message exists
Creating resources for the instance "db2inst1" has failed.
There was an error with one of the issued cluster manager commands.
Refer to the db2diag log file and the Db2 Knowledge Center for details.
Check to see if you have a similar message.
• Check the /tmp/db2diag.log for messages similar to the following ones:
– Line # : 6884---2610-403 The resource is stale. or
– Line # : 9578---2610-422 Cannot execute the command on node <hostname>. The
resource manager IBM.RecoveryRM is not available.
• If you see these errors, this indicates the IBM Tivoli System Automation for Multiplatforms (SA MP)
recovery resource manager daemon experienced a problem. The daemon serves as the decision engine
for Tivoli SA MP and is identified as IBM.RecoveryRM in the system. Diagnostic data will be written by
Tivoli SA MP to diagnose the problem.
• Tivoli SA MP diagnostic data is written into the directories /var/ct/<domain_name>/log/mc/ (for
error logs) and /var/ct/<domain_name>/run/mc/ (core dumps) and /tmp/
• IBM service and development teams use trace and core files for troubleshooting. If you would like IBM
service to analyze the diagnostic data, gather the data listed under the topic “Manual trace and log file
collection” on page 562
• Follow these instructions to upload data to IBM Technical Support:
– Submitting diagnostic information to IBM Technical Support for problem determination
• The IBM Technical Support website is a good source of information, where you can identify known
problems based on symptoms or error log messages
– Db2 support.
Case 2: No errors or messages in the install log or the db2diag log file
Symptoms
• The initial symptom is a hang during the installation process.
• The state of the hang is such that there might not be any messages reported into the /tmp/
db2setup.log install log, or the log does not exist
Diagnosis and resolution:
• If the /tmp/db2setup.log and/or /tmp/db2icrt.log exist, check if you have a similar message
to:
– Creating resources for the instance "db2inst1" has failed.
– There was an error with one of the issued cluster manager commands. Refer
to the db2diag log file and the Db2 Knowledge Center for details.
• If the /tmp/db2diag.log exists check for messages similar to the following ones:
– Line # : 6884---2610-403 The resource is stale. or
– Line # : 9578---2610-422 Cannot execute the command on node <hostname>. The
resource manager IBM.RecoveryRM is not available.
• If you see these errors, this indicates the Tivoli SA MP recovery resource manager daemon experienced
a problem. The daemon serves as the decision engine for Tivoli SA MP and is identified as
IBM.RecoveryRM in the system. Diagnostic data will be written by Tivoli SA MP to diagnose the problem.
• Tivoli SA MP diagnostic data is written into the directories /var/ct/<domain_name>/log/mc/ (for
error logs) and /var/ct/<domain_name>/run/mc/ (core dumps) and /tmp/
db2_cluster_manager_spooling/ (default trace directory). The value for domain_name can be
obtained by running:
• If the /tmp/db2setup.log or /tmp/db2diag.log files do not exist or are empty, gather as much of
the remaining data listed under the topic “Manual trace and log file collection” on page 562 as possible.
Contact IBM service for assistance.
• Follow these instructions to upload data to IBM Technical Support:
– Submitting diagnostic information to IBM Technical Support for problem determination
Manually update IBM Spectrum Scale to meet Db2 pureScale Feature requirements (AIX)
IBM Spectrum Scale is installed as part of Db2 pureScale Feature installation. However, during a Db2
pureScale Feature installation, if a IBM Spectrum Scale cluster already exists but was not created by the
Db2 installer, or, you manually updated the IBM Spectrum Scale level, a failure can occur when you install
the Db2 pureScale Feature. A failure occurs if the already installed IBM Spectrum Scale level does not
match the release or efix level that is required by the Db2 installer.
Procedure
Perform these steps on one host at a time, until all hosts are updated.
Note: To avoid cluster failover and reelecting a cluster manager, update the cluster manager last.
1. Log on as root.
2. Compare the existing IBM Spectrum Scale release and efix level already installed, to the levels on the
installation media.
To verify the IBM Spectrum Scale release already installed on the system, run:
<DB2-image-directory>/db2/aix/gpfs/db2ckgpfs -v install
To verify the IBM Spectrum Scale release on the installation media, run:
<DB2-image-directory>/db2/aix/gpfs/db2ckgpfs -v media
If the IBM Spectrum Scale release level already installed is at a lower level than the level on the
installation media, a IBM Spectrum Scale release level update is required. Otherwise, continue to
verify the IBM Spectrum Scale efix level.
To verify a IBM Spectrum Scale efix level already installed on the system, run:
<DB2-image-directory>/db2/aix/gpfs/db2ckgpfs -s install
To verify the IBM Spectrum Scale efix level on the installation media, run:
<DB2-image-directory>/db2/aix/gpfs/db2ckgpfs -s media
If the installed IBM Spectrum Scale efix level is at a lower level than the level on the installation
media, an IBM Spectrum Scale level update is required.
3. The remaining steps must be performed one host at a time on all hosts.
/usr/lpp/mmfs/bin/mmshutdown
/usr/lpp/mmfs/bin/mmgetstate -a
/usr/lpp/mmfs/bin/mmfsenv -u
Note: If the command indicates that kernel extensions are "busy", to ensure that the updated kernel
extension is loaded, the host must be rebooted after upgrading IBM Spectrum Scale. Alternatively,
identify and kill processes that have a current working directory or open file handles on the IBM
Spectrum Scale cluster. If after terminating the processes, and rerunning the command no longer
indicates a "busy" state, a reboot can be avoided.
6. Update either the IBM Spectrum Scale release level or IBM Spectrum Scale modification level.
Perform one of these steps:
• To update the IBM Spectrum Scale release level (for example, from 3.3 to 3.4), run:
<DB2-image-directory>/db2/aix/gpfs/installGPFS -g
• To update the IBM Spectrum Scale modification level (for example, from 3.3.1 to 3.3.4), run:
<DB2-image-directory>/db2/aix/gpfs/installGPFS -u
/usr/lpp/mmfs/bin/mmstartup
/usr/lpp/mmfs/bin/mmgetstate -a
/usr/lpp/mmfs/bin/mmmount all
10. Verify the IBM Spectrum Scale level now matches the release or efix level that is required by the Db2
installer.
Verify the IBM Spectrum Scale release installed on the system matches the IBM Spectrum Scale
release on the installation media:
<DB2-image-directory>/db2/aix/gpfs/db2ckgpfs -v install
<DB2-image-directory>/db2/aix/gpfs/db2ckgpfs -v media
Verify a IBM Spectrum Scale efix level installed on the system matches the IBM Spectrum Scale efix
level on the installation media:
<DB2-image-directory>/db2/aix/gpfs/db2ckgpfs -s install
<DB2-image-directory>/db2/aix/gpfs/db2ckgpfs -s media
11. Repeat steps 3 to 10 on the next host until all hosts are updated.
Results
All IBM Spectrum Scale hosts are now at the required code level for Db2 pureScale Feature.
Manually update IBM Spectrum Scale to meet Db2 pureScale Feature requirements (Linux)
IBM Spectrum Scale is installed as part of Db2 pureScale Feature installation. However, during a Db2
pureScale Feature installation, if a IBM Spectrum Scale cluster already exists but was not created by the
Db2 installer, or, you manually updated the IBM Spectrum Scale level, a failure can occur when you install
the Db2 pureScale Feature. A failure occurs if the already installed IBM Spectrum Scale level does not
match the release or efix level that is required by the Db2 installer.
Procedure
To update IBM Spectrum Scale to meet Db2 pureScale Feature requirements:
1. Log on as root.
2. Compare the existing IBM Spectrum Scale release and efix level already installed, to the levels on the
installation media.
To verify the IBM Spectrum Scale release already installed on the system, run:
<DB2-image-directory>/db2/linuxamd64/gpfs/db2ckgpfs -v install
To verify the IBM Spectrum Scale release on the installation media, run:
<DB2-image-directory>/db2/linuxamd64/gpfs/db2ckgpfs -v media
If the IBM Spectrum Scale release level already installed is at a lower level than the level on the
installation media, a IBM Spectrum Scale release level update is required. Otherwise, continue to
verify the IBM Spectrum Scale efix level.
To verify the IBM Spectrum Scaleefix level already installed on the system,
<DB2-image-directory>/db2/linuxamd64/gpfs/db2ckgpfs -s install
To verify the IBM Spectrum Scale efix level on the installation media, run:
<DB2-image-directory>/db2/linuxamd64/gpfs/db2ckgpfs -s media
If the installed IBM Spectrum Scale efix level is at a lower level than the level on the installation
media, an IBM Spectrum Scale level update is required.
/usr/lpp/mmfs/bin/mmshutdown
/usr/lpp/mmfs/bin/mmgetstate -a
5. Update either the IBM Spectrum Scale release level or IBM Spectrum Scale modification level.
Perform one of these steps:
• To update the IBM Spectrum Scale release level (for example, from 3.3 to 3.4), run:
<DB2-image-directory>/db2/linuxamd64/gpfs/installGPFS -g
• To update the IBM Spectrum Scale modification level (for example, from 3.3.1 to 3.3.4), run:
<DB2-image-directory>/db2/linuxamd64/gpfs/installGPFS -u
6. After updating either the IBM Spectrum Scale release level or IBM Spectrum Scale modification level,
you must build the GPFS portability layer by issuing these commands:
cd /usr/lpp/mmfs/src
make Autoconfig
make World
make InstallImages
/usr/lpp/mmfs/bin/mmstartup
/usr/lpp/mmfs/bin/mmgetstate -a
/usr/lpp/mmfs/bin/mmmount all
10. Verify the IBM Spectrum Scale level now matches the release or efix level that is required by the Db2
installer.
Verify the IBM Spectrum Scale release installed on the system matches the IBM Spectrum Scale
release on the installation media:
<DB2-image-directory>/db2/linuxamd64/gpfs/db2ckgpfs -v install
<DB2-image-directory>/db2/linuxamd64/gpfs/db2ckgpfs -v media
Verify a IBM Spectrum Scale efix level installed on the system matches the IBM Spectrum Scale efix
level on the installation media:
<DB2-image-directory>/db2/linuxamd64/gpfs/db2ckgpfs -s install
<DB2-image-directory>/db2/linuxamd64/gpfs/db2ckgpfs -s media
11. Repeat steps 3 to 10 on the next host until all hosts are updated.
Results
All IBM Spectrum Scale hosts are now at the required code level.
What to do next
Continue installing Db2 pureScale Feature.
Frequently asked questions about Db2 pureScale Feature host validation problems
The following sections provide possible solutions to problems you might encounter when attempting to
validate remote hosts.
What if the global registry variable record on the host indicates a GPFS Cluster already exists?
In some cases, the clean up of the global registry might not have been complete, leaving behind a record
(GPFS_CLUSTER) that indicates there is a GPFS cluster in use by Db2 on the host when in fact there is not
one. Contact IBM Software Support.
What if the Db2 installer detects an existing RSCT peer domain on a host?
During a Db2 installation, only one RSCT peer domain can be active at a time. The peer domain that was
not created by the Db2 installer must be stopped or removed before creating a RSCT peer domain to be
used by the IBM Db2 pureScale Feature.
To stop the RSCT peer domain using the db2cluster command on the host db2host1, log on to a host that
belongs to the same active RSCT peer domain, run the db2cluster -cm -stop -host db2host1
command. If the db2cluster command is not available, run the stoprpdomain <domain name>
command. Run the lsrpdomain command to determine the domain name to specify.
If an attempt to validate host did not finish within the time out period, it will time out. Check the
connections to the host and try to add it again. You can also change the time out variable and re-run the
installation command.
To remove the RSCT peer domain:
1. Remove the remote host from the host list.
2. If the remote host "db2host1" belongs to a different Db2 pureScale instance, remove it from that Db2
pureScale instance using the "db2iupdt -drop" command.
To remove the host from a Db2 pureScale instance:
1. Log on to a different host which belongs to the same Db2 pureScale instance
2. Run
To remove a remote host that does not belong to a Db2 instance, run one of the following commands:
• db2cluster -cm -remove -host <hostname>
• rmrpnode <hostname>
What if there is a conflict between the DEFAULT_INSTPROF record and the instance shared directory
specified?
The Db2 installer has detected a conflict between the DEFAULT_INSTPROF record and the instance
shared directory specified. Do not specify the instance shared directory. The DEFAULT_INSTPROF record
in the global registry indicates the instance shared file system has already been set up by Db2.
In this case, the following options or keywords are not needed.
• For a response file installation: INSTANCE_SHARED_DEVICE_PATH and INSTANCE_SHARED_DIR.
• For db2icrt or db2iupdt: instance_shared_dev and instance_shared_dir.
If the value for INSTANCE_SHARED_DIR / instance_shared_dir matches with the existing instance shared
file system mount point, the Db2 installer will still allow to pass. However, if the value does not match, the
installation will fail.
What if the cluster interconnect netname for a target host failed to ping the IIH?
When adding a new host, the Db2 installer checks to see if the new host can send a small data packet to
the IIH and receive a reply. This send-reply test is commonly known as a ping - this message is the result
of a failed ping. If there is a problem, you can verify the results by running this command line from the
remote host's console: ping IIH address_or_name.
If this test fails, ensure that the ssh communication between the IIH and the remote hosts is set up
correctly.
After it is verified that the problem occurs outside the Db2 installer, there are various things the you can
check find the source of the problem, for example bad physical connections (for example, loose cable), a
faulty network adapter driver, or an improperly setup network. Check the network adapter and cable, or
choose a different one.
What if the cluster interconnect netname for a target host is not on the same subnet as the IIH?
You can reconfigure the network adapter or choose a different one. This error occurs when the cluster
interconnect netname for a host is not on the same subnet as the installation-initiating host. All cluster
interconnects for CF need to be on the same subnet for performance reasons (all hosts within the same
subnet can usually be reached in one routing hop).
For example if the cluster interconnect network is configured with the network address 192.168.0.0/24,
then all cluster interconnect netname addresses should start with 192.168.0 (for example, 192.168.0.1,
192.168.0.2, and so on).
Check the network card configuration on the new host (for example, run the ifconfig -a command)
and check /etc/hosts if a name was used instead of an address.
Frequently asked questions about installation, instance creation, and rollback problems with the
Db2 pureScale Feature
Use the answers in this list of frequently asked questions to help you provide possible solutions to
problems that might arise during the installation process of the IBM Db2 pureScale Feature.
Ensure that you issue any installation or creation commands with the debug (-d) parameter to generate
the trace file which will assist in any debugging attempts.
What is a rollback?
A rollback occurs when an operation was unsuccessful and the Db2 installer will clean up the
environment. You can have an instance rollback and a Db2 binaries rollback. If the Db2 installer fails to
create a Db2 instance, it will trigger a rollback of the instance on all hosts. An instance rollback does not
rollback Db2 binaries.
A partial rollback occurs when you can set up an instance, but it cannot be extended to a particular host.
What if my instance was only partially created? What if my instance was created, but some of my
members and CFs were not created?
It might occur during an installation that your instance was created only on some of the target hosts.
Instance rollback happens on the hosts where the instance creation was not completed without triggering
a binary rollback. You will receive a post installation message showing the hosts that have not been
included in the instance due to an error.
Resolve any errors outlined in the installation logs. After you've resolved the problem, you can run the
db2isetup or db2iupdt command to add the members or CFs:
• You can validate the host and add it to the instance by using the Db2 Instance Setup wizard (by issuing
db2isetup command).
• You can extend your instance by issuing the db2iupdt -add command from a host that already
belongs to the instance.
What if the IBM Spectrum Scale cluster fails to extend to the target host?
If the IBM Spectrum Scale cluster extension to the host fails, you can review the log file, found in the
DB2DIR/tmp directory to determine why the extension was prevented on that host.
What if the IBM Spectrum Scale cluster was not deleted during a failed installation?
If a failed attempt to install a Db2 product results in a rollback, the Db2 installer might be unable to
remove the newly created IBM Spectrum Scale file system. To understand why the IBM Spectrum Scale
cluster was not removed, review the log file found in the DB2DIR/tmp directory .
What do I do when I cannot bring the disks online after a storage failure on one site was rectified?
If nodes come online before the storage device, you must ensure that the disk configurations are defined
and available before you try to restart the failed disk. If the Device and DevType fields are marked by a
- when you try to list the network shared disk (NSD) using the mmlsnsd -X command, you must ensure
that the device configurations for the disk services are available before attempting to restart the disks.
Please consult the operating system and device driver manuals for the exact steps to configure the
device. On AIX platforms, you can run the cfgmgr command to automatically configure devices that have
been added since the system was last rebooted.
What do I do if a computer's IP address, used for the IB interface, cannot be pinged after a reboot?
Ensure the InfiniBand (IB) related devices are available:
If the devices are not available, bring them online with chdev:
Ensure that the ib0, icm and iba0 properties are set correctly, that ib0 references an IB adapter such as
iba0, and that properties are persistent across reboots. Use the -P option of chdev to make changes
persistent across reboots.
What do I do if access to the IBM Spectrum Scale file systems hangs for a long time on a storage
controller failure?
Ensure the device driver parameters are set properly on each machine in the cluster.
GPFS Deadman Switch timer has expired and there are still outstanding I/O requests
If this is the case, then ensure that the device driver parameters have been properly set
What happens when one site loses Ethernet connectivity and the LPARs on that site are expelled
from the IBM Spectrum Scale cluster?
If the IBM Spectrum Scale cluster manager is on the tiebreaker site this behavior is expected, as the
cluster manager does not have IB or Remote Direct Memory Access (RDMA) over Converged Ethernet
(RoCE) connectivity and can no longer talk to the site which has lost Ethernet connectivity. If the IBM
Spectrum Scale cluster manager is not on the tiebreaker, but is on the site that retains Ethernet
connectivity then ensure that the tiebreaker site is a IBM Spectrum Scale quorum-client, not a quorum-
manager, as per the mmaddnode command. If the tiebreaker host is a quorum-manager its status can be
What do I do when one site loses ethernet connectivity and the members on that site remain stuck in
STOPPED state instead of doing a restart light and going to WAITING_FOR_FAILBACK state?
Ensure that LSR has been disabled.
How can I remove unused IBM Spectrum Scale Network Shared Disks (NSD)?
Scenario that can lead to the need to manually remove unused NSDs:
1. User-driven or abnormal termination of db2cluster CREATE FILESYSTEM or ADD DISK command.
2. The unused NSDs were created manually at some point earlier but left in the system.
The free NSDs need to be removed before they can be used in db2cluster command with either
CREATE FILESYSTEM or ADD DISK option. Use the following instructions to remove them:
Note: Run all the following commands on the same host.
1. Run mmlsnsd -XF to list the free NSD and its corresponding device name.
---------------------------------------------------------------------------------------------
------
gpfs2118nsd 09170151FFFFD473 /dev/hdisk7 hdisk coralpib21a.torolab.ibm.com
2. Find the NSD name that matches the target device to be removed.
gpfs2118nsd
What if the "PEER_DOMAIN" global registry variable record on the target hosts was not created?
As with any errors, review the installation log for specific details. If the log indicates that the
PEER_DOMAIN global registry variable record was not created on the specified hosts, contact IBM
Software Support.
Where can I find the IBM Db2 pureScale Feature sample response file?
The Db2 pureScale Feature sample response file, db2dsf.rsp, is located in DB2DIR/install/db2/
platform/samples directory, where platform refers to the appropriate operating system.
Post-installation
This section contains information that will help you understand, isolate, and resolve problems that you
might encounter after the installation of the IBM Db2 pureScale Feature.
Symptoms
A Db2 instance fails to start on the execution of the db2start command.
• For more information about the listed alerts, run the db2cluster -cm -list -alert command. For
example, the db2cluster -cm -list -alert command might return something like the following
alert:
Alert: Db2 member '0' failed to start on its home host 'host01'. The
cluster manager will attempt to restart the Db2 member in restart
light mode on another host. Check the db2diag log file for messages
concerning failures on hosts 'host01' for member '0'
Action:
This alert must be cleared manually with the command:
db2cluster -cm -clear -alert.
Impact: Member 0 will not be able to service requests until
this alert has been cleared and the member returns to its home host.
• Check the <instance_owner>.nfy log for information about when the failure occurred. Look in this
member's db2diag log file for more details on why the failure occurred. Look for error messages that are
related to db2rstar or db2rstop in the db2diag log file.
• The system error log for the affected host can also be consulted if the cause of the error is still
unknown. For example:
– In the output shown previously, member 0 is not started.
– Login to host01 and view the system error log by running the errpt -a command (AIX) or looking at
the /var/log/messages (Linux).
– In the system error log, look for related log entries at the time of the failure.
• If an alert was shown from db2cluster -list -alert, run db2cluster -clear -alert after
the problem is resolved, and try the db2start command again.
CF server failure
Use the information in this topic to help you diagnose if a cluster caching facility (CF) component failed.
Symptoms
A Db2 instance fails to start on the execution of the db2start command.
• If any alerts are present, run db2cluster -cm -list -alerts for more information. The alerts will
provide more information about what might need to be fixed (for example, a network adapter or host is
offline), or point to the cfdiag*.log files for more information.
• Look for errors related in the CF's db2diag log file that pertain to the time when the db2start
command was run:
• Search the sections of the db2diag log file preceding previous trace point for more information as to
why the CF has not started. For example, if cluster services cannot start a CF, the db2diag log file might
show:
• Each CF writes information to the cfdiag*.log and dumps more diagnostic data when required. The
files reside in the directory set by the database manager configuration parameter cf_diagpath or if
not set, the diagpath, or $INSTHOME/sqllib_shared/db2dump/ $m by default.
– CF diagnostic log files (cfdiag-<timestamp>.<cf_id>*.log)
- Each of these files keeps a log of the activities that are related to a CF. Events, errors, warnings, or
additional debugging information will be logged there. This log has a similar structure to the
db2diag log file. A new log is created each time that a CF starts. The logging level is controlled by
the cf_diaglevel database manager configuration parameter .
- Note that there is a static CF diagnostic log name that always points to the most current diagnostic
logging file for each CF and has the following format: cfdiag.<cf_id>.log
– CF output dump diagnostic files cfdump.YYYYMMDDhhmmssuuuuuu.<host>.<cf_id>.out
- These files contain information regarding CF startup and stop. There might be some additional
output in these files.
– Management LightWeight Daemon diagnostic log file (mgmnt_lwd_log.<cf_pid>)
- This log file displays the log entries that pertain to the LightWeight Daemon (LWD) process for a
particular CF. Errors in this log file indicate that the LWD has not started properly.
– CF stack files (CAPD.<cf_pid>.<tid>.thrstk)
- These are stack files produced by the CF when it encounters a signal. These files are important for
diagnosing a problem with the CF.
– CF trace files (CAPD.tracelog.<cf_pid>)
- A default lightweight trace is enabled for the CF.
- These trace files appear whenever the CF terminates or stops.
- The trace files might indicate a problem with the CF, but these files are useful for diagnosing errors
only when used in combination with other diagnostic data.
• If the cfdump.out.* file does not contain the "cluster caching facility initialized" line or "cluster
caching facility Object Information" and other lines shown in the following example, the CF process did
not start successfully. An error message might be shown instead. Contact IBM Support for more
information.
• In this example, cfdiag-20091109015035000037.128.log contains a successful process start. If
the CF did not start properly, this log might be empty or contain error messages.
• Look for core files or stack traceback files in the CF_DIAGPATH directory.
• The system error log for the affected host might also be consulted if the cause of the error is still
unknown. Log onto the CF host that has not been started and view the system error log by running the
errpt -a command (on Linux, look in the /var/log/messages file). Look for related log entries at
the time of the failure. In the example shown here, login to host04 and host05, because CF 128 and CF
129 reside on these hosts.
• If an alert was shown from db2cluster -list -alert, run db2cluster -clear -alert after
the problem is resolved, and then reissue the db2start command.
The db2diag log file might also show messages similar to the following ones:
These messages indicate a communication error between a CF and a member. Follow these steps:
1. Locate the pdLogCfPrintf messages and search for the message string CF RC=. For example, CF
RC= 2148073491.
2. Take the numeric value adjacent to this string; in this example it is 2148073491. This value
represents the reason code from the network or communication layer.
3. To find more details on this error, run the db2diag tool with the -cfrc parameter followed by the
value. Example: db2diag -cfrc 2148073491.
4. If the system was recently enabled with uDAPL and InfiniBand, check your uDAPL configuration.
5. Ping the IB hostnames from each member host that is showing the previously listed errors to the CFs
IB hostnames, and from the CF hosts to the IB hostnames of those members.
6. If pinging the IB hostnames fails, verify that the port state is up. To verify if the port state is up, run
ibstat -v. In the following example, the link should be good because Physical Port Physical State
has a value of Link Up, Logical Port State has a value of Active, and Physical Port State has a value of
Active:
$ ibstat -v
------------------------------------------------------------------------------
IB NODE INFORMATION (iba0)
------------------------------------------------------------------------------
Number of Ports: 2
Globally Unique ID (GUID): 00.02.55.00.02.38.59.00
Maximum Number of Queue Pairs: 16367
Maximum Outstanding Work Requests: 32768
Maximum Scatter Gather per WQE: 252
Maximum Number of Completion Queues: 16380
Maximum Multicast Groups: 32
Maximum Memory Regions: 61382
Maximum Memory Windows: 61382
Hw Version info: 0x1000002
Number of Reliable Datagram Domains: 0
Total QPs in use: 3
Total CQs in use: 4
Total EQs in use: 1
Total Memory Regions in use: 7
Total MultiCast Groups in use: 2
Total QPs in MCast Groups in use: 2
EQ Event Bus ID: 0x90000300
EQ Event ISN: 0x1004
NEQ Event Bus ID: 0x90000300
NEQ Event ISN: 0x90101
------------------------------------------------------------------------------
IB PORT 1 INFORMATION (iba0)
------------------------------------------------------------------------------
Global ID Prefix: fe.80.00.00.00.00.00.00
Local ID (LID): 000e
Local Mask Control (LMC): 0000
Logical Port State: Active
Physical Port State: Active
Physical Port Physical State: Link Up
Physical Port Speed: 2.5G
Physical Port Width: 4X
Maximum Transmission Unit Capacity: 2048
Current Number of Partition Keys: 1
Partition Key List:
P_Key[0]: ffff
Current Number of GUID's: 1
Globally Unique ID List:
GUID[0]: 00.02.55.00.02.38.59.00
$ lsdev -C | grep ib
fcnet0 Defined 00-08-01 Fibre Channel Network Protocol Device
fcnet1 Defined 00-09-01 Fibre Channel Network Protocol Device
ib0 Available IP over Infiniband Network Interface
iba0 Available InfiniBand host channel adapter
icm Available Infiniband Communication Manager
• If setup was performed correctly, and the hardware is functioning correctly, all three values should
be 'Available'.
• If the network interface is not 'Available', you can change the device state manually. To change the
device state manually you can use the following command:
• If iba0 or icm are not in the Available state, check for errors on the device. To check for errors on
the device, run /usr/sbin/cfgmgr -vl iba0 or /usr/sbin/cfgmgr -vl icm as a root user.
• If iba0 is not found or remains in the Defined state, confirm that the Host Channel Adapter is
currently assigned to the host on the HMC.
10. Verify that the cf-server processes were running on the CF server hosts at the time of the error. If the
CF hosts were not up, not initialized, or were restarted at that time (when performing db2instance
-list at the time, the primary CF was PRIMARY and the secondary was in PEER), check
cfdump.out*, cfdiag*.log, and core files for more details. However, if the CF servers were up
and initialized at the time of the error, then there might be a uDAPL communication problem.
11. If a db2start command or a CONNECT statement was issued, to determine whether there is a
different failure that caused these errors to appear as a side effect, see “CF server failure” on page
580.
12. If this is not the case, a trace of the failing scenario is often useful to determine the cause for the
error. If CF trace was enabled, dump it. To dump CF trace, run the following command: db2trc cf
dump fileName where you define the value for the fileName parameter.
13. To enable CF trace if it was not already enabled, run the following command: db2trc cf on -m
"*.CF.xport_udapl.*.*" .
14. IBM Service might additionally request an AIX system trace and AIX memory traces to facilitate
problem determination.
15. If CF trace on xport_udapl and any AIX system trace were recorded, collect this information. Run
the db2support command to collect further diagnostic logs. Run snap -Y as root on all hosts, and
contact IBM Service for further help.
2. Turn the trace on. To turn on the trace, use the trace command:
where:
• The default trace mode is used. Two global buffers are used to continuously gather trace data, with
one buffer being written to the log file, while data is gathered in the other buffer
• -T buffersize specifies the trace buffer size
• -L filesize specifies the output trace log file size
• -a specifies to run the trace in the background
• -j trace_hooks specifies the type of event to trace. Multiple events can be specified as a comma-
separated list
Note: Values for trace mode, buffer scope, buffersize, filesize, and trace_hooks depend on the problem
that is being experienced. Contact IBM Service for recommended values.
3. Reproduce the problem
4. Turn off the trace. Use the trcstop command.
5. Dump the trace buffer to a file. Use the trcrpt command:
where filename specifies the file you dump the trace buffer to. For information about the AIX tracing
facility, seeAIX Information Center: Trace Facility
AIX also supports in-memory only traces for some components. These include default on traces, that
were selected such that there is minimal performance impact. The traces are not written to a disk log
file without explicit action by IBM Service personnel. To increase the InfiniBand in-memory tracing to a
detailed level, use the following command:
Procedure
Use the following steps to verify your uDAPL configurations:
1. Verify that the InfiniBand (IB) ports are functional on all hosts, and that the physical port states are
ACTIVE.
2. On the CF hosts, verify that the IP address associated with the IB ports matches the IP addresses
used for the net names for the CF entry in the db2nodes.cfg file.
a) View the IP address that is associated with the IB ports on the CF host.
To view the IP address that is associated with the IB port, run the ifconfig -a command. The IP
address can be found by looking at the address that is associated with the inet field as shown:
coralpib23:/coralpib23/home/lpham> ifconfig -a
ib0: flags=e3a0063<UP,BROADCAST,NOTRAILERS,RUNNING,ALLCAST,MULTICAST,LINK0,LINK1,GROUPRT,64BIT>
inet 10.1.1.23 netmask 0xffffff00 broadcast 10.1.1.255
tcp_sendspace 262144 tcp_recvspace 262144 rfc1323 1
In the output, ib0 is the interface name. The status is UP, and the IP address is 10.1.1.23. It is
important to ensure that the interface status is up.
b) Ensure the network names for the CF in the db2nodes.cfg file match with the IP addresses for
the intended IB port to use for the CF.
You must also ensure that the name can be pinged, and is reachable from all hosts on the cluster.
From each member host, run a ping command against the network names that are associated with
the CF entry in the db2nodes.cfg file. Observe the IP address returned. The IP address must
match the IP address that is associated with the IB port configuration at the CF host, as in the
ifconfig -a output.
Note: When you ping an IP address on a different subnet, the pings are unsuccessful. This occurs
when you have multiple subnet masks for each interface when there are multiple interfaces defined
for the CF. In this case, from the member, ping the target IP address on the CF host that has the
same subnet mask as the interface on the member host.
3. Verify that the uDAPL interface is configured in the /etc/dat.conf file on all hosts, and that the right
adapter port value is used.
Since Db2 pureScale uses uDAPL 2.0, look for the first entry that has u2.0 in the second column with
the matching interface name and port number. The following entry might look similar to the entry in
your /etc/dat.conf file:
hca2 u2.0 nonthreadsafe default /usr/lib/libdapl/libdapl2.a(shr_64.o) IBM.1.1 "/dev/iba0 1 ib0" " "
In the example, hca2 is the unique transport device name for the uDAPL interface. The u2.0 indicates
that the entry is for a uDAPL 2.0 application. You must ensure that the /usr/lib/libdapl/
libdapl2.a file exists for it is the uDAPL shared library. The /dev/iba0 1 ib0 output is the uDAPL
provider-specific instance data. In this case, the adapter is iba0. The port is 1, and the interface name
is ib0.
If the CF is configured with multiple interfaces by using multiple netnames in the db2nodes.cfg file,
you must ensure that all the interfaces are defined in the dat.conf file.
Note: The /etc/dat.conf file must only contain entries for the adapters that are in the local host.
The sample /etc/dat.conf file that is installed by default typically contains irrelevant entries. To
avoid unnecessary processing of the file, make the following changes:
• Move all the Db2 pureScale cluster-related adapter entries to the top of the file.
• Comment out the irrelevant entries or remove them from the file.
b) To determine the port value that is used by the member on the connect request, look for the
PsOpen event in the Db2 member diagnostic log (db2diag.log) file.
Look for the value of the caport field. In the following example, the port value for the target CF is
also 37761.
2013-04-29-16.00.56.371442-240 I80540A583 LEVEL: Event
PID : 10354874 TID : 772 PROC : db2sysc 0
INSTANCE: lpham NODE : 000
HOSTNAME: coralpib23
EDUID : 772 EDUNAME: db2castructevent 0
FUNCTION: Db2, Shared Data Structure Abstraction Layer for CF, SQLE_SINGLE_CA_HANDLE::sqleSingleCfOpenAndConnect,
probe:1264
DATA #1 : <preformatted>
PsOpen SUCCESS: hostname:coralpib23-ib0 (member#: 128, cfIndex: 1) ; device:hca2 ; caport:37761 ; transport: UDAPL
Connection pool target size = 9 conn (seq #: 3 node #: 1)
Procedure
Use the following steps to verify your uDAPL configurations:
1. Examine the physical port states by running the ibstat -v command.
Ensure that the State is Active, and the Physical State is LinkUp as shown in the following example:
CA 'mthca0'
CA type: MT25208 (MT23108 compat mode)
Number of ports: 2
Firmware version: 4.7.400
Hardware version: a0
Node GUID: 0x0005ad00000c03d0
System image GUID: 0x0005ad00000c03d3
Port 1:
State: Active
Physical state: LinkUp
Rate: 10
Base lid: 16
LMC: 0
SM lid: 2
Capability mask: 0x02510a68
If the port State is not Active, check the cable for connectivity.
2. On the CF hosts, verify that the IP address associated with the IB ports matches the IP addresses
used for the net names for the CF entry in the db2nodes.cfg file.
a) View the IP address that is associated with the IB ports on the CF host.
To view the IP address that is associated with the IB port, run the ifconfig -a command. The IP
address can be found by looking at the address that is associated with the inet addr field as
shown:
coralxib20:/home/svtdbm3 >ifconfig -a
ib0 Link encap:UNSPEC HWaddr 80-00-04-04-FE-80-00-00-00-00-00-00-00-00-00-00
inet addr:10.1.1.120 Bcast:10.1.1.255 Mask:255.255.255.0
inet6 addr: fe80::205:ad00:c:3d1/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:65520 Metric:1
RX packets:18672 errors:0 dropped:0 overruns:0 frame:0
TX packets:544 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:256
RX bytes:2198980 (2.0 Mb) TX bytes:76566 (74.7 Kb)
In the output, ib0 is the interface name. The status is UP, and the IP address is 10.1.1.120. It is
important to ensure that the interface status is up.
b) Ensure the network names for the CF in the db2nodes.cfg file match with the IP addresses for
the intended IB port to use for the CF.
You must also ensure that the name can be pinged, and is reachable from all hosts on the cluster.
From each member host, run a ping command against the network names that are associated with
the CF entry in the db2nodes.cfg file. Observe the IP address returned. The IP address must
match the IP address that is associated with the IB port configuration at the CF host, as in the
ifconfig -a output.
Note: When you ping an IP address on a different subnet, the pings are unsuccessful. This occurs
when you have multiple subnet masks for each interface when there are multiple interfaces defined
for the CF. In this case, from the member, ping the target IP address on the CF host that has the
same subnet mask as the interface on the member host.
3. Verify that the uDAPL interface is configured in the /etc/dat.conf file on all hosts, and that the right
adapter port value is used.
Since Db2 pureScale uses uDAPL 2.0, look for the first entry that has u2.0 in the second column with
the matching interface name and port number. On Linux, the adapter port value is not used, and is "0".
The following entry might look similar to the entry in your /etc/dat.conf on SLES, or /etc/rdma/
dat.conf on RHEL file:
In the output, ofa-v2-ib0 is the unique transport device name for the uDAPL interface. The u2.0
indicates that the entry is for a uDAPL 2.0 application. You must ensure that the libdaplofa.so.2
file exists for it is the uDAPL shared library. The ib0 0 output is the uDAPL provider-specific instance
data. In this case, the adapter is ib0, and the port is "0", since it is not used.
If the CF is configured with multiple interfaces by using multiple netnames in the db2nodes.cfg file,
you must ensure that all the interfaces are defined in the dat.conf file.
Note: The /etc/dat.conf file must only contain entries for the adapters that are in the local host.
The sample /etc/dat.conf file that is installed by default typically contains irrelevant entries. To
avoid unnecessary processing of the file, make the following changes:
Procedure
To obtain a IBM Spectrum Scale trace, run the following steps as the root user:
1. Make sure the /tmp/mmfs directory exists on all nodes.
Trace reports and internal dumps are written to this directory.
2. Set up the trace level on each IBM Spectrum Scale cluster by entering the following command:
What if my members or cluster caching facilities are not in STARTED, PEER, or CATCHUP state?
If one of your members or cluster caching facilities is not in the STARTED, PEER, or CATCHUP state,
perform the following steps:
1. Stop the instance by issuing the db2stop command.
2. Restart the instance by issuing the db2start command.
3. Review the messages from the failed db2start command, and resolve the problem.
4. If the problem persists, review the db2diag log file, which might have more details about the problem.
5. If reviewing the file does not yield a probable cause of the problem, perform the following steps:
a. Drop the member or cluster caching facility by issuing the db2iupdt -drop command.
b. Add the member or cluster caching facility again by issuing the db2iupdt -add command.
What if the db2iupdt -drop command failed to shrink the RSCT peer domain?
To manually remove failed hosts from the RSCT peer domain:
1. Check whether there are still resources attached to the peer domain by logging on to the failed host as
the root user and issuing the lssam command.3 The lssam command shows all resources in the
domain, not just those on a particular host. Review the listing to ensure that the host that you are
attempting to drop does not have any resources attached.
2. If no resources are attached, from the installation-initiating host (IIH), enter db2cluster -cm -
remove -host host_name .
3. If there are still resources attached, perform the following steps:
a. From the IIH, switch to the instance owner by entering su - instance_owner .
b. Remove the resource by entering db2cluster -cm -delete -resources. This step deletes all
resources in the cluster.
c. Switch back to root.
d. Remove the failed hosts from the RSCT peer domain by entering db2cluster -cm -remove -
host host_name.
e. Switch to the instance owner again.
f. Re-create the resources on the hosts that did not fail by entering db2cluster -cm -create -
resources.
Member 0 experienced a problem with its home host, hostA, and is running in light mode on hostC.
Member 0 shows that an ALERT has occurred. Run db2cluster -cm -list -alert to find out what
the ALERT is for.
Because hostA is unavailable, the state for the host is set to INACTIVE. Member 0 can not fail back while
hostA is in the INACTIVE state; member 0 remains in the WAITING_FOR_FAILBACK state. Because
restart light failed on hostB, an alert was raised as an indication to the administrator to investigate the
problem.
If hostA becomes available again, its state will change from INACTIVE to ACTIVE. Member 0 will fail back
to hostA, and its state will change from WAITING_FOR_FAILBACK to STARTED. This is covered in further
details at .
For information about diagnosing this symptom, see the link "HostA fails, restart light works on a hostC,
but not the first hostB"
Member shows alert while in STARTED state, all hosts are in ACTIVE state with no alerts
This symptom observed occurs as a follow on from the scenario "Member WAITING_FOR_FAILBACK,
alert, corresponding host(s) is in INACTIVE state with alert". The output of the db2instance -list
command shows a member with an alert while its state is STARTED. All hosts are in ACTIVE state with no
alerts.
This is a sample output from the db2instance -list command showing a three member, two cluster
caching facility environment:
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
Member 0 still has its alert flagged as an indicator to the administrator to investigate the problem that
occurred on hostB that prevented the member from performing a restart light.
Note: The hostB problem is detailed in the parent topic Member WAITING_FOR_FAILBACK, alert,
corresponding host(s) is in INACTIVE state with alert.
For information about diagnosis, see the related link "Host (A) fails, restart light works on a host (C), but
not the first host (B), failback N/A".
Symptoms
The following sample output from the db2instance -list command shows an environment with three
members and two cluster caching facilities:
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
-- ---- ----- --------- ------------ ----- ---------------- ------------ -------
0 MEMBER WAITING_FOR_FAILBACK hostA hostC YES 0 1 hostC-ib0
1 MEMBER STARTED hostB hostB NO 0 0 hostB-ib0
2 MEMBER STARTED hostC hostC NO 0 0 hostC-ib0
128 CF PRIMARY hostD hostD NO - 0 hostD-ib0
129 CF PEER hostE hostE NO - 0 hostE-ib0
Member 0 experienced a problem with its home host, hostA, and attempted a restart light on hostB.
However, the restart light failed on hostB. The member then attempted a restart light on hostC, which
was successful.
If hostA becomes available again, its state will change from INACTIVE to ACTIVE. member 0 will fail back
to hostA, and the state of the member will change from WAITING_FOR_FAILBACK to STARTED.
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
-- ---- ----- --------- ------------ ----- ---------------- ------------ -------
0 MEMBER STARTED hostA hostA YES 0 0 hostA-ib0
1 MEMBER STARTED hostB hostB NO 0 0 hostB-ib0
2 MEMBER STARTED hostC hostC NO 0 0 hostC-ib0
128 CF PRIMARY hostD hostD NO - 0 hostD-ib0
129 CF PEER hostE hostE NO - 0 hostE-ib0
Troubleshooting steps
To help troubleshoot the restart light failure on hostB, take one or both of the following steps:
• Check the db2diag log file for information about the failure, and then investigate it.
The following sample output shows the restart light attempt on hostB:
Check the diag messages to analyze the errors corresponding to the restart light failure on hostB.
• See “Diagnosing a host reboot with a restart light” on page 610 for steps to diagnose the host failure
on hostA.
• See “Diagnosing a cluster file system failure that occurred during restart light” on page 595 for an
example on how to troubleshoot this scenario.
• After you diagnose the problem, clear the alert for the member.
Diagnosing a cluster file system failure that occurred during restart light
A member attempts to perform a restart light, but a cluster file system error occurs, which causes the
restart light to fail.
Symptoms
The objective of this topic is to diagnose the cause of the failure. This is a sample output from the
db2instance -list command showing a three member, two cluster caching facility environment:
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
-- ---- ----- --------- ------------ ----- ---------------- ------------ -------
0 MEMBER RESTARTING hostA hostB No 0 1 hostB-ib0
1 MEMBER STARTED hostB hostB NO 0 0 hostB-ib0
2 MEMBER STARTED hostC hostC NO 0 0 hostC-ib0
128 CF PRIMARY hostD hostD NO - 0 hostD-ib0
129 CF PEER hostE hostE NO - 0 hostE-ib0
There is a state of RESTARTING for member 0 occurring on hostB. The home host for member 0 is hostA.
This output indicates member 0 is actively performing a restart light on hostB
HostA has a state of INACTIVE with an corresponding alert. This indicates an abnormal host shutdown
such as a power failure or a failure to access the host due to a network communication failure.
A subsequent db2instance -list output shows that the restart light failed on hostB. The member then
attempts the restart light on hostC and is successful. Member 0 is left in the WAITING_FOR_FAILBACK
state because hostA remains offline.
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
-- ---- ----- --------- ------------ ----- ---------------- ------------ -------
0 MEMBER WAITING_FOR_FAILBACK hostA hostC YES 0 1 hostC-ib0
1 MEMBER STARTED hostB hostB NO 0 0 hostB-ib0
2 MEMBER STARTED hostC hostC NO 0 0 hostC-ib0
128 CF PRIMARY hostD hostD NO - 0 hostD-ib0
129 CF PEER hostE hostE NO - 0 hostE-ib0
If hostA comes online, the host state will return to ACTIVE, and member 0 will fail back to hostA as shown
in the following db2instance -list output
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
-- ---- ----- --------- ------------ ----- ---------------- ------------ -------
0 MEMBER STARTED hostA hostA YES 0 0 hostA-ib0
1 MEMBER STARTED hostB hostB NO 0 0 hostB-ib0
2 MEMBER STARTED hostC hostC NO 0 0 hostC-ib0
128 CF PRIMARY hostD hostD NO - 0 hostD-ib0
129 CF PEER hostE hostE NO - 0 hostE-ib0
• Check the db2diag log file to analyze the error message corresponding to the restart light failure on
hostB. The following output is generated during the restart light attempt:
The output shows that the fcntl() operating system function was called and generated an error
message and code of EINPROGRESS (55) "Operation now in progress". The File System
Type is mmfs, which is the General Parallel File System (GPFS).
• Run the errpt -a operating system command to view the contents of the AIX errpt system log. In
the scenario, as shown in the following sample output, the AIX errpt log from hostB contains MMFS_*
messages (for example, MMFS_MOREINFO) that were generated at approximately the same time as the
aforementioned restart light message. These MMFS* messages indicate that the problem originated in
the GPFS subsystem.
LABEL: MMFS_MOREINFO
IDENTIFIER: E7AA68A1
Description
SOFTWARE PROGRAM ERROR
See the topic “Diagnosing a host reboot with a restart light” on page 610 for help in diagnosing the cause
of the host failure that initiated the restart light.
Member 1 experienced a problem with its home host, hostB, and is running in light mode on hostC.
Member 1 shows that an ALERT has occurred. Running db2cluster -cm -list -alert provides
information about the alert, as the following output example shows:
1.
Alert: Db2 member '1' is currently awaiting failback to its home host 'coral18'.
The cluster manager has restarted the Db2 member in restart light mode on another
host. The home host for the member is available however automatic failback to home
has been disabled.
Action: In order to move the member back its home host, the alert must be manually
cleared with the command: 'db2cluster -cm -clear -alert'.
Impact: Db2 member '1' will not be able to service requests until this alert has
been cleared and the Db2 member returns to its home host.
When hostB is available, the state for the host is set to ACTIVE. Member 1 cannot failback to hostB
because automatic failback is disabled; member 1 remains in the WAITING_FOR_FAILBACK state until an
administrator clears the alert and manually enables automatic failback by running the following
command:
The instance must be restarted for the setting of the previous command to become effective.
This output shows that Db2 cluster services could not restart the failed member on its home host or on
any other host in the cluster. As shown in the STATE column, member 0 is in the ERROR state. Manual
intervention is required, as indicated by the alert flagged for member 0
For information about diagnosing this symptom, see "Host failed and restart light does not work on any
host, failback did not happen."
Host failed and restart light does not work on any host
A member and a host fail. Restart light is not successful on any host; failback does not happen.
Find out why the member failed and why the restart light was unsuccessful.
• On the home host, check the db2diag log file and errpt files for FODC directories.
• Check for cluster caching facilities states by issuing the db2cluster command, as follows:
Member 0 has experienced a problem with its home host, hostA, and performed a successful restart light
on hostB.
Running the db2instance -list at this point shows no alerts for the member 0 after it performed a
restart light. This is indicated in the ALERT column for member 0. This means member 0 was successful
on its first attempt at performing restart light on hostB.
Since hostA is unavailable, the state for the host is set to INACTIVE. member 0 can not fail back while
hostA is in the INACTIVE state; member 0 will therefore remain in WAITING_FOR_FAILBACK state.
If hostA becomes available again, its state will change from INACTIVE to ACTIVE. member 0 will fail back
to hostA, and its state will change from WAITING_FOR_FAILBACK to STARTED. To diagnose this
symptom see Diagnosing a host reboot with a restart light.
Member WAITING_FOR FAILBACK, no alerts, the corresponding host is in ACTIVE state with an alert
The output of the db2instance -list command shows at least one member in
WAITING_FOR_FAILBACK state with no alerts, and the corresponding host is in ACTIVE state with an
alert.
This is a sample output from the db2instance -list command showing a three member, two cluster
caching facility environment:
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
-- ---- ----- --------- ------------ ----- ---------------- ------------ -------
0 MEMBER WAITING_FOR_FAILBACK hostA hostB NO 0 1 hostB-ib0
1 MEMBER STARTED hostB hostB NO 0 0 hostB-ib0
2 MEMBER STARTED hostC hostC NO 0 0 hostC-ib0
128 CF PRIMARY hostD hostD NO - 0 hostD-ib0
129 CF PEER hostE hostE NO - 0 hostE-ib0
This db2instance -list output shows that member 0 experienced a problem with its home host,
hostA, and performed a successful restart light on hostB.
Running the db2instance -list at this point shows no alerts on the member 0 after it performed a
restart light. This is indicated in the ALERT column for the member. This means member 0 was successful
on its first attempt at restart light, which was performed on hostB.
There is an alert flagged for hostA, which is an indication to the administrator to investigate a problem on
hostA. Although hostA is ACTIVE, there could be an issue preventing its usage in the cluster such as a file
system or network problem.
For information about diagnosis for this symptom see the related link "Host fails, restarted light
successfully, failback fails or cannot restart on home host ".
A member cannot restart on the home host after a successful restart light
After a host failure, the member fails over to a guest host in restart light mode successfully, but then
cannot fail back, and you are unable to restart the member on the home host manually.
Case 2: SQL30108N
This case occurs if restarting the daemon fails with an error, resulting in a restart light onto another host
in the cluster.
The following scenario shows the symptoms for this error:
• The application returns an SQL30108N error message.
• Check for an accumulation of diagnostic data under the /var file system. Tivoli SA MP will write
diagnostic information into /var/ct/db2domain/log/mc/ (error logs) and /var/ct/
db2domain/run/mc/ (trace and core dumps) and /tmp/db2_cluster_manager_spooling
(default trace directory). .
The following instructions give details on diagnosis and resolution:
• Check the db2diag log file for messages similar to the following one:
If you see the previous errors, this indicates that the Tivoli SA MP recovery resource manager
daemon experienced a problem. Diagnostic data will be written by Tivoli SA MP to diagnose the
problem.
• If there is a continuous accumulation of diagnostic data written into the /var/ct/
db2domain/log/mc/ and /var/ct/db2domain/run/mc/ directories, it is safe to archive old
diagnostic data to an archive location. For example
– mv /var/ct/db2domain/run/mc/IBM.RecoveryRM/trace.6.sp /archivepath
where /var/ct/db2domain/run/mc/IBM.RecoveryRM/trace.6.sp is a Tivoli SA MP diagnostic
destination path
Note: /archivepath is an arbitrary archive file system
• It is important to monitor the /var/ct/db2domain/log/mc/ and /var/ct/db2domain/run/mc/
directories on a regular basis and maintain free space of at least 3 GB for the /var file system.
• IBM service and development teams use trace and core files for troubleshooting. If you would like IBM
Technical Support to analyze the diagnostic data, obtain a db2support package by running the
following command on each node in the cluster
– db2support output_directory -d database_name -s
• Follow these instructions to upload data to IBM Technical Support:
– Submitting diagnostic information to IBM Technical Support for problem determination
• The IBM Technical Support website is a good source of information, where you can identify known
problems based on symptoms or error log messages
– Linux UNIX and Windows support.
Symptoms
A member sporadically experiences failures during startup or normal day-to-day OLTP processing. The
member is subsequently restarted successfully on its home host, or restarted light onto another host if
restarting on the home host is not possible. Informational stack traceback files, dump files, and other
data that is normally dumped is written into the diagpath and cf_diagpath.
A possible symptom is that there is an increased pace of disk space usage within the db2dump file system
due to the writing of such diagnostics data.
• Check the db2diag log file for Restart Light messages after the aforementioned diag messages. See
“Restart events that might occur in Db2 pureScale environments” on page 605 for more information
about the various restart messages including the Restart Light message.
• After locating the pdLogCfPrintf messages, search for the diag message string CF RC=. For example, CF
RC= 2148073491
• Take the numeric value adjacent to this string, in this example it is 2148073491. This represents the
reason code from the network or communications layer.
• To find more details on this error, use the db2diag tool. For example, db2diag -cfrc 2148073491
• Ping the cluster caching facility to see if it is online. If the ping is successful, gather a db2support
package by running db2support output_directory -d database_name -s on each cluster and
contact IBM Technical Support.
• A uDAPL trace might be requested by IBM Service for diagnosing such problems, see “Running a trace
for uDAPL over InfiniBand connections” on page 585.
Symptoms
Note: The db2dump directory will only be lost if it is on the GPFS file system. If it has been set to
something else (For example, diagpath) then this GPFS failure will not affect it.
This is a sample output from the db2instance -list command showing a three member, two cluster
caching facility environment, where there has been a host alert:
db2instance -list
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
-- ---- ----- --------- ------------ ----- ---------------- ------------ -------
0 MEMBER WAITING_FOR_FAILBACK hostA hostB NO 0 1 hostB-ib0
1 MEMBER STARTED hostB hostB NO 0 0 hostB-ib0
2 MEMBER STARTED hostC hostC NO 0 0 hostC-ib0
128 CF PRIMARY hostD hostD NO - 0 hostD-ib0
129 CF PEER hostE hostE NO - 0 hostE-ib0
• Confirm that you can access the file systems in question by using the ls or cd operating system
commands.
ls /db2cfs/db2inst1/sqllib/db2dump
cd /db2cfs/db2inst1/sqllib/db2dump
If the file systems are inaccessible or offline, these commands will return a message indicating that the
directory does not exist or is not available.
• If the file system is inaccessible by running the ls or cd commands, confirm if the file systems are
considered mounted on the problematic host.
– Using this example scenario, on hostA run
If it is not mounted the /db2cfs/db2inst1/sqllib file system will not be shown in the result set.
– To mount the file systems in question run the command db2cluster -cfs -mount -filesystem
fs_name
• Check the db2diag log file at the diagpath location. If you require further information about the
failure, look for relevant messages to ascertain the problem leading to the restart light. There might be a
db2diag log record corresponding to the time of the restart, for example
See “Restart events that might occur in Db2 pureScale environments” on page 605 for more information
about the various restart messages including the Restart Light message.
• Another source of reference for diagnosing problems is the system log. Run the operating system
command errpt -a to view the contents of the AIX errpt system log. In this scenario example, by
looking at the AIX errpt log from hostA just before the time of the aforementioned Restart Light
message, you see MMFS_* messages (For example MMFS_GENERIC, MMFS_PHOENIX in errpt) with
text that is similar to the following text:
message: "GPFS: 6027-752 Lost membership in cluster hostA. Unmounting file
systems.
• Check the disks, disk connections, and fibre channel cards. The root cause in this example scenario was
faulty interconnects between the host and the SAN.
Note the Successful start of member string from function sqlhaStartPartition, which
indicates a Local Restart has initiated.
2. Member Restart Light
Note the Idle process taken over by member which indicates that a recovery idle process was
activated to perform a Restart Light where member 1 failed over to hostC.
3. Member crash recovery
The Are we performing member crash recovery? Yes string shows a members crash
recovery event occurred.
4. Group Restart
5. Group Crash Recovery
..............
• The following lines from the message log indicate a Group Crash Recovery event occurred.
This output shows that hostA has an alert set on it. To gather more information about the alert, use the
db2cluster command with the -cm -list -alert option. If the /var file system free space is less
than 25 MB on any host, the following alert message will appear for that host:
Alert: Host name hostA has 10 MB free space available on /var file system. Failure
to write to /var due to no
space will result in Db2 cluster failure.
Action: The file system requires a minimum of 25 MB free disk space is available.
Free up space on /var file system.
The alert will be cleared automatically when a
sufficient amount of space becomes available on the filesystem.
Impact: Db2 Cluster Services may not function properly on the specified host and
may eventually lead to a Db2 instance failure.
Even though this alert is only informative, it is extremely important that you monitor this situation and
take action to increase the file system space on the affected host.
Another possibility is that Db2 is unable to gather information about the /var file system usage for a
host. In this case, the following alert message will appear for that host:
Alert: There was a failure to retrieve /var file system usage on host name
db2dev07. Check the db2diag.log for messages
concerning failures on host
'hostA' for /var file system.
Impact: Db2 cluster will fail to write information about important errors and
events. This will cause Db2 cluster failure.
Because it is essential that the host file system usage is monitored, you should take action as soon as
possible.
Like members, each cluster caching facility will log information into the cfdiag*.log and dump more
diagnostic data when required. The files will reside in the directory set by the database manager
configuration parameter cf_diagpath or if not set, the diagpath or $INSTHOME/sqllib_shared/
db2dump/ $m by default.
• cluster caching facility Diagnostic Log Files (cfdiag-timestamp.cf_id.log)
– Each of these files keep a log of the activities related to a cluster caching facility. Events, errors,
warnings, or additional debugging information will be logged here. This log has a similar appearance
to the db2diag log file. A new log is created each time a cluster caching facility starts.
– Note that there is a single static cluster caching facility diagnostic log name that always points to the
most current diagnostic logging file for each cluster caching facility and has the following format:
cfdiag.cf_id.log
• cluster caching facility Output Dump Diagnostic Files (cfdump.out.cf_pid.hostname.cf_id)
– These files contain information regarding cluster caching facility startup and stop. There might be
some additional output shown here.
• Management LWD Diagnostic Log File (mgmnt_lwd_log.cf_pid)
– This log file displays the log entries of a particular cluster caching facility's LightWeight Daemon
(LWD) process. Errors presented in this log file indicate the LWD has not started properly. A
successful start will not have ERROR messages in the log.
• cluster caching facility stack files (CAPD.cf_pid.tid.thrstk)
– These are stack files produced by the cluster caching facility when it encounters a signal. These files
are important for diagnosing a problem with the cluster caching facility.
• cluster caching facility trace files (CAPD.tracelog.cf_pid)
– A default lightweight trace is enabled for the cluster caching facility. These trace files appear
whenever the cluster caching facility terminates or stops. These might indicate a problem with the
cluster caching facility, only in combination with other diagnostic data can these files be useful in
diagnosing any errors.
A startup and initialization message will be shown in the cluster caching facility dump files. For example,
the message for cfdump.out.1548476.host04.128 contains the message that shows a successful
process start:
Look for the relevant cluster caching facility diagnostic log files by looking for the cfdiag log that has
the same CF ID as the failed cluster caching facility. For example, if CF ID 128 failed (as it did in the
previous db2instance -list command), use the following command:
$ ls cfdiag*.128.log
Note that cfdiag.128.log always points to the most current cfdiag log for CF 128. Look into
cfdiag-20091109015035000037.128.log (the previous cfdiag log) and the db2diag log file at a
time corresponding to 2009-11-10-02.30.22.000215 for errors.
The system error log for the affected host can also be consulted if the cause of the error is still unknown.
Log onto the unstartedcluster caching facility host and view the system error log by running the errpt -a
command (on Linux, look in the /var/log/messagesfile). In the example shown here, log in to hostD
because CF 128 experienced the failure.
This symptom can be caused by a member restart on the home host. Ongoing or a repetition of problems
causing multiple member restarts on the home host can lead to the growing space usage of the
diagpath diagnostic dump directory ($INSTHOME/sqllib/db2dump/ $m by default)
• Check the instance_owner.nfy log for information about when the failure occurred.
• Look for entries in this member db2diag log file around this timestamp for more details on why the
failure occurred.
Note: Check for error messages related to db2rstar in the db2diag log file.
• Look for FODC directories in the diagpath location (or sqllib/db2dump/ $m directory by default). If
there are FODC directories see the related link to "First occurrence data capture information" for
instructions on how to proceed.
• If there are no FODC directories and the cause of the error is still unknown, consult the system error log
for the affected host.
– Log in to each host
– On AIX, run the errpt -a command to view the system error log.
– On Linux, look in /var/log/messages
– Look for evidence of a host reboot event, see “Diagnosing a host reboot with a restart light” on page
610.
Diagnosis
This section describes how to identify a restart light that has occurred due to a host reboot.
A message will display in the db2diag log file showing a restart light event, for example
A message might display in the AIX error log showing a reboot based on the time of the db2diag log file
entry shown previously. Run errpt -a to access the AIX error log. The following scenarios are three
possible reasons for this occurrence:
• A user-initiated host shutdown and reboot has occurred.
LABEL: REBOOT_ID
IDENTIFIER: 2BFA76F6
Description
SYSTEM SHUTDOWN BY USER
Probable Causes
SYSTEM SHUTDOWN
Detail Data
USER ID
0
0=SOFT IPL 1=HALT 2=TIME REBOOT
0
TIME TO REBOOT (FOR TIMED REBOOT ONLY)
0
LABEL: TS_CRITICAL_CLNT_ER
IDENTIFIER: 75FA8C75
Description
Critical client blocked/exited
Probable Causes
Group Services daemon was blocked too long or exited
Failure Causes
Group Services daemon blocked: resource contention
Group Services daemon blocked: protocol problems
Group Services daemon exited: internal failure
Group Services daemon exited: critical client failure
Recommended Actions
Group Services daemon blocked: reduce system load
Group Services daemon exited: diagnose Group Services
Detail Data
DETECTING MODULE
rsct,monitor.C,1.124.1.3,5520
ERROR ID
6plcyp/dyWo7/lSx/p3k37....................
REFERENCE CODE
LABEL: KERNEL_PANIC
IDENTIFIER: 225E3B63
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Troubleshooting
If the affected host is online, run the db2instance -list command,. If the db2instance -list
shows that the member is reported as WAITING_FOR_FAILBACK, look for alerts in the output. Check the
alert(s), you might have to clear an alert before the member can fail back to its home host. If there is still
no failback, see “A member cannot restart on the home host after a successful restart light” on page 599.
Symptoms
The output of the db2instance -list command includes a pending failback operation, as shown in the
following example:
ID TYPE STATE HOME_HOST CURRENT_HOST ALERT PARTITION_NUMBER LOGICAL_PORT NETNAME
-- ---- ----- --------- ------------ ----- ---------------- ------------ -------
0 MEMBER WAITING_FOR_FAILBACK hostA hostB NO 0 1 hostB-ib0
1 MEMBER STARTED hostB hostB NO 0 0 hostB-ib0
2 MEMBER STARTED hostC hostC NO 0 0 hostC-ib0
128 CF PRIMARY hostD hostD NO - 0 hostD-ib0
129 CF PEER hostE hostE NO - 0 hostE-ib0
In the previous example, hostA has a state of INACTIVE, and an ALERT field is marked as YES. This output
of the db2instance -list command is seen when hostA is offline or rebooting. Since the home host for
member 0, hostA is offline, member 0 has failed over to hostB. Member 0 is now waiting to failback to its
home host, as indicated by the WAITING_FOR_FAILBACK state. After hostA is rebooted from the panic,
member 1 will fail back to hostA.
Another way to diagnose this type of problem is to check the system log. Run the OS command errpt -a
to view the contents of the AIX errpt system log. In the AIX errpt log, you might see log entries similar in
the following example, which is for hostA:
LABEL: KERNEL_PANIC
IDENTIFIER: 225E3B63
Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED
Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES
Detail Data
ASSERT STRING
5.1: xmemout succeeded rc=d
PANIC STRING
kx.C:2024:0:0:04A53FA8::advObjP == ofP->advLkObjP
If you see a KERNEL_PANIC log entry as shown in the previous example, the system reboot might be due
to an operating system kernel panic that was triggered by a problem in the IBM Spectrum Scale
subsystem. A kernel panic and system reboot can be the result of excessive processor usage or heavy
paging on the system when the IBM Spectrum Scale daemons do not receive enough system resources to
perform critical tasks. If you experience IBM Spectrum Scale filesystem outages that are related to kernel
panics, the underlying processor usage or paging issues must be resolved first. If you cannot resolve the
underlying issues, run the db2support command for the database with the -s parameter to collect
diagnostic information and contact IBM Technical Support.
Procedure
• To repair the cluster manager resource model, issue this command from an online host:
Results
If the cluster resource model cannot be repaired, contact an IBM Service Representative for more
information about how to recover from this problem.
Procedure
1. Use the DB2INSTANCE environment variable to specify the target instance.
export DB2INSTANCE=<inst-name>
2. Issue the db2cluster command with the -repair -domain option while inside the install directory
or the sqllib/bin directory.
To obtain the CM domain name, run the db2cluster command: db2cluster -cm -list -
domain. (You can also obtain the domain name with the db2greg -dump command.)
If the cluster manager domain is in an unhealthy state or if there are resources still online, the
db2cluster command may fail and indicate that the command should be re-issued with the -force
option. Re-issuing the command with the -force option will successfully re-create the cluster
manager domain in these cases, but it will also reset the cluster host failure detection time to the
default value of 8 seconds. The host failure detection time can be subsequently re-adjusted in this
case.
Example
A DBA with Db2 cluster services authority needs to re-create a cluster manager domain, MYDOMAIN, in
Db2 instance MYINST.
export DB2INSTANCE=myinst1
As the domain is torn down and re-created, db2cluster issues informational messages about the
progress and the successful completion of the operation:
Procedure
• To set up the unhealthy host response:
• If you want to automatically reboot the host, issue the following command:
• If you want to automatically take any member or cluster caching facility on the host offline, issue
the following command:
Results
If you specified host reboot as the automatic response and the host is successfully rebooted, then any
member should be restarted on the host. After a reboot, the member should be restarted on the host,
unless automatic failback is disabled. Any cluster caching facility on the host should be restarted on the
host, but if it was previously the primary cluster caching facility on the host , it will now be the secondary
cluster caching facility on the host. Information about the reboot event is written to the syslog file. For
more information, see the "Related links" section.
If you specified that the member or cluster caching facility should be taken offline and this does occur,
the member or cluster caching facility will have an alert on it. The offlined member will not restart on or
failback to its home host until this alert is removed. The offlined cluster caching facility will not restart
until this alert is removed.
Uninstallation
This section contains information that will help you understand and resolve problems that you might
encounter while uninstalling the IBM Db2 pureScale Feature.
Procedure
To clean up an incomplete Db2 pureScale instance drop:
1. Remove the RSCT Peer Domain (PD) forcefully:
• Try to run rmrpdomain -f domain_name.
If you do not know what the domain name is, run lsrpdomain and look for the domain name under
the Name column.
$ lsrpdomain
Name OpState RSCTActiveVersion MixedVersions TSPort GSPort
db2domain Online 2.5.3.5 No 12347 12348
installed_path/instance/db2iset -d instance_name
Procedure
1. List all of the existing peer domains by running the lsrpdomain command.
The output of this command should be similar to the following one:
lsrpdomain
If nothing is listed, there is no active peer domain. Only one RSCT peer domain can be active (online) at
any given time, and any operations (stopping, removing, or adding a node) will only affect this online
domain.
2. List all of the nodes in this active peer domain by running the db2cluster -cm -list -host -
state command.
The output of this command should be similar to the following output:
HOSTNAME STATE
----------- ------
coralpib135 ONLINE
coralpib136 ONLINE
3. Remove the entire peer domain, including all nodes. The peer domain must be online for the following
command to be successful. Run the db2cluster -cm -delete -domain db2domain command.
The output of this command should be similar to the following output:
4. To confirm that the peer domain has been removed appropriately, run the db2cluster -cm -list
-domain command.
What if the db2idrop command failed to remove the host from the GPFS cluster?
To help determine the cause of the failure, review the db2cluster command log file in the
DB2DIR/tmp/ibm.db2.cluster.* directory. After resolving the problem, reissue the db2idrop
command.
What if the db2idrop command failed because the instance is not usable?
To help determine the cause of the failure, review the db2idrop command log in the DB2DIR/tmp
directory. Check if instance_user/sqllib/db2nodes.cfg is valid.
What if the db2idrop command failed when removing the GPFS CLUSTER global registry (or the PEER
DOMAIN global registry) on specific hosts?
If this failure occurs, contact IBM Software Support.
What if the db2idrop command failed to remove the RSCT peer domain?
If the db2idrop command failed to remove the RSCT peer domain, you will have to manually remove it
by following these steps:
1. Check if there is still resource attached to the peer domain by running the lssam command.4
2. If there are still resources attached, perform the following steps:
a. Switch to the instance owner by entering su - instance_owner.
b. Remove the resource by entering db2cluster -cm -delete -resources.
c. Switch back to root.
3. Remove RSCT peer domain by running the db2cluster -cm -delete -domain domain_name
command from the DB2DIR/bin directory. Run the lsrpdomain command to determine the domain
name to specify.
For more information, see Manually cleaning up an IBM Reliable Scalable Cluster Technology peer
domain.
db2trc on -m '*.*.CIE.*.*'
To help diagnose severe errors, it might also help to look in the db2diag log file.
Additional tracing facilities are available for the Text Search server. For details, see the topic about
logging and tracing for the Db2 Text Search server.
2. Collect log files that are stored in the logpath configured for the text search server.
3. In some cases, you might want to capture more detailed information:
• Specify your preferred command-line tool logging properties by editing the
ecmts_config_logging.properties file in the directory <absolute_path_to_config_folder>.
You receive the message CIE00311 telling you that an Check to see if the instance has a config directory for
internal file cannot be opened. This message can text search.
indicate a missing configuration directory, or indicate
• If the config directory is missing ensure that Db2
that a file may have been lost or corrupted, for
Text Search was installed and configured.
example, due to a disk full error on the filesystem
where db2tss/config is located, or because of a • If the config directory is in another location, add a
problem during the backup of the db2tss/config symbolic link to it (UNIX).
directory. If this error is being caused by a missing or corrupted
file, contact IBM Support.
A text index update with a large number of documents If it is not feasible to increase available memory,
to process fails with an 'insufficient memory' message decrease the documentqueueresultsize value in
in the db2diag.log file. sysibmts.tsdefaults administrative view and try
again.
You encounter message IQQG0037W message in a Ensure that the FOR DATA REDISTRIBUTION option is
query about a missing collection after a data used the next time a text search UPDATE INDEX
redistribution. command is issued.
Overview
The specification of the diagnostic data directory path or the alternate diagnostic data directory path,
using the diagpath or alt_diagpath database manager configuration parameters, can determine
which one of the following directory path methods can be used for diagnostic data storage:
Primary diagnostic data directory path
All diagnostic data for members, cluster caching facilities, database partition servers, and database
partitions is logged to a private db2diag log file. This split diagnostic data directory path is the default
condition unless you specify the diagpath value with a valid path name and the $h, $n, or $m pattern
identifiers.
Alternate diagnostic data directory path
The alt_diagpath database manager configuration parameter is an alternate diagnostic data
directory path that provides a secondary path for storing diagnostic information. The path specified by
the alt_diagpath parameter is used only when the database manager fails to write to the path
specified in diagpath and ensures that important diagnostic information is not lost. For the alternate
diagnostic data directory path to be available, you must set the alt_diagpath configuration
parameter. For greater resiliency, it is recommended that you set this parameter to a path that is on a
different file system than diagpath.
Benefits
The benefit of specifying a single diagnostic data directory path is that diagnostic information, from
several database partitions and hosts, can be consolidated in a central location for easy access by setting
a single diagnostic data directory path. The benefit of using the default split diagnostic data directory path
is that diagnostic logging performance can be improved because of less contentions on the db2diag log
file.
The benefits of specifying a secondary diagnostic data path, alt_diagpath, are:
• Increased resiliency to the loss of important diagnostic information.
• Compatibility with some tools used for diagpath such as splitting.
Splitting a diagnostic data directory path by database partition server, database partition, or both
You can specify a diagnostic data directory path so that separate directories are created and named
according to the database partition server, database partition, or both.
You can specify a diagnostic data directory path to separately store diagnostic information according to
the database partition server or database partition from which the diagnostic data dump originated.
Procedure
• Splitting diagnostic data directory path per physical database partition server
• To specify a default diagnostic data directory path, execute the following step:
This command creates a subdirectory under the default diagnostic data directory with the
computer name, as shown in the following example:
Default_diagpath/HOST_db-partition-server-name
• To split a user specified diagnostic data directory path (for example, /home/usr1/db2dump/),
execute the following step:
- Set the diagpath database manager configuration parameter to split the /home/usr1/
db2dump/ diagnostic data directory path per database partition server by issuing the following
command:
/home/usr1/db2dump/HOST_db-partition-server-name
This command creates a subdirectory for each partition under the default diagnostic data directory
with the partition number, as shown in the following example:
Default_diagpath/NODEnumber
• To split a user specified diagnostic data directory path (for example, /home/usr1/db2dump/),
execute the following step:
- Set the diagpath database manager configuration parameter to split the /home/usr1/
db2dump/ diagnostic data directory path per database partition by issuing the following
command:
/home/usr1/db2dump/NODEnumber
• Splitting diagnostic data directory path per physical database partition server and per database
partition
• To specify a default diagnostic data directory path, execute the following step:
This command creates a subdirectory for each logical partition on the database partition server
under the default diagnostic data directory with the database partition server name and partition
number, as shown in the following example:
Default_diagpath/HOST_db-partition-server-name/NODEnumber
• To specify a user specified diagnostic data directory path (for example, /home/usr1/db2dump/),
execute the following step:
- Set the diagpath database manager configuration parameter to split the /home/usr1/
db2dump/ diagnostic data directory path per database partition server and per database partition
by issuing the following command:
/home/usr1/db2dump/HOST_db-partition-server-name/NODEnumber
For example, an AIX database partition server, named boson, has 3 database partitions with node
numbers 0, 1, and 2. The following example shows a sample list output for the directory:
usr1@boson /home/user1/db2dump->ls -R *
HOST_boson:
HOST_boson:
NODE0000 NODE0001 NODE0002
HOST_boson/NODE0000:
db2diag.log db2eventlog.000 db2resync.log db2sampl_Import.msg events usr1.nfy
HOST_boson/NODE0000/events:
db2optstats.0.log
HOST_boson/NODE0001:
db2diag.log db2eventlog.001 db2resync.log usr1.nfy stmmlog
HOST_boson/NODE0001/stmmlog:
stmm.0.log
HOST_boson/NODE0002:
db2diag.log db2eventlog.002 db2resync.log usr1.nfy
What to do next
Note:
• If a diagnostic data directory path split per database partition is specified ($n or $h$n), the NODE0000
directory will always be created for each database partition server. The NODE0000 directory can be
ignored if database partition 0 does not exist on the database partition server where the NODE0000
directory was created.
• To check that the setting of the diagnostic data directory path was successfully split, execute the
following command:
To merge separate db2diag log files to make analysis and troubleshooting easier, use the db2diag -
merge command. For additional information, see: "db2diag - db2diag logs analysis tool command" in the
Command Reference and “Analyzing db2diag log files using db2diag tool” on page 459..
Configuration
The administration notification log files can be configured in size, location, and the types of events and
level of detail recorded, by setting the following database manager configuration parameters:
Legend:
1. A timestamp for the message.
2. The name of the instance generating the message.
3. For multi-partition systems, the database partition generating the message. (In a nonpartitioned
database, the value is "000".)
4. The process identifier (PID), followed by the name of the process, followed by the thread
identifier (TID) that are responsible for the generation of the message.
5. Identification of the application for which the process is working. In this example, the process
generating the message is working on behalf of an application with the ID
*LOCAL.DB2.020205091435.
This value is the same as the appl_id monitor element data. For detailed information about how
to interpret this value, see the documentation for the appl_id monitor element.
Setting the error capture level for the administration notification log file
This task describes how to set the error capture level for the administration notification log file.
Procedure
• To check the current setting, issue the GET DBM CFG command.
Look for the following variable:
Overview
With Db2 diagnostic and administration notification messages both logged within the db2diag log files,
this often makes the db2diag log files the first location to examine in order to obtain information about
the operation of your databases. Help with the interpretation of the contents of these diagnostic log files
is provided in the topics listed in the "Related links" section. If your troubleshooting attempts are unable
to resolve your problem and you feel you require assistance, you can contact IBM Software Support (for
details, see the "Contacting IBM Software Support" topic). In gathering relevant diagnostic information
that will be requested to be sent to IBM Software Support, you can expect to include your db2diag log
files among other sources of information which includes other relevant logs, storage dumps, and traces.
The db2diag log file can exist in two different forms:
Configuration
The db2diag log files can be configured in size, location, and the types of diagnostic errors recorded by
setting the following database manager configuration parameters:
diagsize
The value of diagsize decides what form of diagnostic log file will be adopted. If the value is 0, a
single diagnostic log file will be adopted. If the value is not 0, rotating diagnostic log files will be
adopted, and this nonzero value also specifies the total size of all rotating diagnostic log files and all
rotating administration notification log files. The instance must be restarted for the new value of the
diagsize parameter to take effect. See the "diagsize - Diagnostic log file size configuration
parameter" topic for complete details.
diagpath
Diagnostic information can be specified to be written to db2diag log files in the location defined by
the diagpath configuration parameter. See the "diagpath - Diagnostic data directory path
configuration parameter" topic for complete details.
alt_diagpath
The alt_diagpath database manager configuration parameter provides an alternate diagnostic data
directory path for storing diagnostic information. If the database manager fails to write to the path
specified by diagpath, the path specified by alt_diagpath is used to store diagnostic information.
diaglevel
The types of diagnostic errors written to the db2diag log files can be specified with the diaglevel
configuration parameter. See the "diaglevel - Diagnostic error capture level configuration parameter"
topic for complete details.
Note: If the diagsize configuration parameter is set to a non-zero value, that value specifies the total
size of the combination of all rotating administration notification log files and all rotating diagnostic log
files contained within the diagnostic data directory. For example, if a system with 4 database partitions
has diagsize set to 1 GB, the maximum total size of the combined notification and diagnostic logs can
reach is 4 GB (4 x 1 GB).
Legend:
1.
A timestamp and timezone for the message.
Note: Timestamps in the db2diag log files contain a time zone. For example:
2006-02-13-14.34.35.965000-300, where "-300" is the difference between UTC (Coordinated
Universal Time, formerly known as GMT) and local time at the application server in minutes. Thus
-300 represents UTC - 5 hours, for example, EST (Eastern Standard Time).
2.
The record ID field. The recordID of the db2diag log files specifies the file offset at which the current
message is being logged (for example, "27204") and the message length (for example, "655") for the
platform where the Db2 diagnostic log was created.
3.
The diagnostic level of the message. The levels are Info, Warning, Error, Severe,
Critical, and Event.
4.
The process ID
5.
The thread ID
6.
The process name
7.
The name of the instance generating the message.
8.
For multi-partition systems, the database partition generating the message. (In a non-partitioned
database, the value is "000".)
9.
The database name
10.
The application handle. This value aligns with that used in db2pd output and lock dump files. It
consists of the coordinator partition number followed by the coordinator index number, separated by
a dash.
11.
Identification of the application for which the process is working. In this example, the process
generating the message is working on behalf of an application with the ID
9.26.54.62.45837.070518182042.
A TCP/IP-generated application ID is composed of three sections
The fields which were not already explained in the example, are:
• <source> Indicates the origin of the logged error. (You can find it at the end of the first line in the
sample.) The possible values are:
– origin - message is logged by the function where error originated (inception point)
– OS - error has been produced by the operating system
– received - error has been received from another process (client/server)
– sent - error has been sent to another process (client/server)
• MESSAGE Contains the message being logged. It consists of:
The Informational record is output for db2start on every logical partition. This results in multiple
informational records: one per logical partition. Since the informational record contains memory values
which are different on every partition, this information might be useful.
Procedure
• To check the current setting, issue the command GET DBM CFG.
Look for the following variable:
• To change the value dynamically, use the UPDATE DBM CFG command.
To change a database manager configuration parameter online:
For example:
4. Review the console output, especially the types of information that are collected.
You should see output like this (when run on Windows):
...
Collecting "System files"
"db2cache.prf"
"db2cos9402136.0"
"db2cos9402840.0"
"db2dbamr.prf"
"db2diag.bak"
"db2eventlog.000"
"db2misc.prf"
"db2nodes.cfg"
"db2profile.bat"
"db2systm"
"db2tools.prf"
"HealthRulesV82.reg"
"db2dasdiag.log"
...
Collecting "Detailed operating system and hardware information"
Collecting "System resource info (disk, CPU, memory)"
Collecting "Operating system and level"
Collecting "JDK Level"
Collecting "Db2 Release Info"
Collecting "Db2 install path info"
Collecting "Registry info"
...
Creating final output archive
"db2support.html"
"db2_sqllib_directory.txt"
"detailed_system_info.html"
"db2supp_system.zip"
"dbm_detailed.supp_cfg"
"db2diag.log"
db2support is now complete.
An archive file has been produced: "db2support.zip"
5. Now use a Web browser to view the detailed_system_info.html file. On each of your systems,
identify the following information:
• Number of CPUs
• Operating system level
• User environment
• User resource limits (UNIX ulimit command)
Exercise 2: Locating environment information in a Db2 trap file
1. Ensure a Db2 instance is started, then issue
The call stacks are placed in files in the diagnostic directory (as defined by the diagpath database
manager configuration parameter).
2. Locate the following in one of the trap files:
...
<DB2TrapFile version="1.0">
<Trap>
<Header>
Db2 build information: Db2 v9.7.800.683 n130210 SQL09078
timestamp: 2013-03-15-10.32.37.894000
uname: S:Windows
comment: IP23428
process id: 7224
thread id: 6032
</Header>
<SystemInformation>
Number of Processors: 2
Processor Type: AMD64 Family 6 Model 44 Stepping 2
OS Version: Microsoft Windows Longhorn, Service Pack 1 (6.1)
Current Build: 7601
</SystemInformation>
<MemoryInformation>
<Usage>
Physical Memory: 8191 total, 2545 free.
Virtual Memory : 8388607 total, 8387728 free.
Paging File : 16381 total, 11030 free.
Ext. Virtual : 0 free.
</Usage>
</MemoryInformation>
<EnvironmentVariables>
M![CDATA[
[e] DB2PATH=D:\SQLLIB
[n] DB2INSTPROF=C:\ProgramData\IBM\DB2\db2build
[g] DB2_EXTSECURITY=YES
[g] DB2_COMMON_APP_DATA_PATH=C:\ProgramData
[g] DB2SYSTEM=JTANG
[g] DB2PATH=D:\SQLLIB
[g] DB2INSTDEF=DB2
[g] DB2ADMINSERVER=DB2DAS00
]]></EnvironmentVariables>
2005-10-14-10.56.21.523659
PID : 782348 TID : 1 PROC : db2cos
INSTANCE: db2inst1 NODE : 0 DB : SAMPLE
APPHDL : APPID: *LOCAL.db2inst1.051014155507
FUNCTION: oper system services, sqloEDUCodeTrapHandler, probe:999
EVENT : Invoking /home/db2inst1/sqllib/bin/db2cos from
oper system services sqloEDUCodeTrapHandler
Trap Caught
OSName: AIX
NodeName: n1
Version: 5
Release: 2
Machine: 000966594C00
...
The db2diag log files will contain entries related to the occurrence as well. For example:
Dump files
Dump files are created when an error occurs for which there is additional information that would be useful
in diagnosing a problem (such as internal control blocks). Every data item written to the dump files has a
timestamp associated with it to help with problem determination. Dump files are in binary format and are
intended for IBM Software Support representatives.
Note: For partitioned database environments, the file extension identifies the partition number. For
example, the following entry indicates that the dump file was created by a Db2 process running on
partition 10:
SingleQuery_955_timestamp_memberNumber db2cos.bat
An operation that requires shared sort memory
exceeded the database sort heap size (sortheap) and
failed with SQL955N.
SharedSort_955_timestamp_memberNumber db2cos.bat
Concurrent operations requiring shared sort memory
exceeded the database sort heap threshold for shared
sorts (sheapthres_shr). This caused a query to fail with
SQL955N.
WLM_Queue_ db2cos.bat
timestamp_memberNumber
The workload manager admission control feature
queued a statement for longer than expected. This
triggered an internal diagnostic process that created
this FODC package. No action is required.
FODC_Trap_ db2cos_trap.bat
timestamp_memberNumber
An instance wide trap occurred.
FODC_Panic_ db2cos.bat
timestamp_memberNumber
Engine detected an incoherence and decided not to
continue.
FODC_BadPage_ db2cos_datacorruption.bat
timestamp_memberNumber
A bad page was detected.
FODC_DBMarkedBad_ db2cos.bat
timestamp_memberNumber
A database has been marked bad due to an error.
FODC_[Index|Data|Col]Error_directory db2cos_[index|data|col]
_timestamp_PID_EDUID_memberNumber error_long.bat or db2cos_[index|data|col]
error_short.bat
An EDU wide index error occurred.
FODC_Connections_ db2cos_threshold.bat
timestamp_memberNumber
User invoked db2fodc -connections to collect
connection-related diagnostic data, used to diagnose
problems such as sudden spikes in the number of
applications in the executing or compiling state or new
database connections being denied.
FODC_Cpu_ db2cos_threshold.bat
timestamp_memberNumber
User invoked db2fodc -cpu to collect processor-
related performance and diagnostic data, used to
diagnose problems such as high processor utilization
rates, a high number of running processes, or high
processor wait times.
FODC_Hang_ db2cos_hang.bat
timestamp_memberList
User invoked db2fodc -hang to collect data for hang
troubleshooting (or severe performance).
FODC_Memory_ db2cos_threshold.bat
timestamp_memberNumber
User invoked db2fodc -memory to collect memory-
related diagnostic data, used to diagnose problems such
as no free memory available, swap space being used at
a high rate, excessive paging or a suspected a memory
leak.
FODC_Preupgrade_ db2cos_preupgrade.bat
timestamp_memberNumber
User invoked db2fodc -preupgrade to collect
performance related information before a critical
upgrade or update such as upgrading an instance or
updating to the next fix pack.
db2set DB2FODC=FODCPATH=/home/hotel49/juntang/FODC
If you now want to change the FODC path dynamically on member 1 and member 2, you use the following
db2pdcfg commands. These settings are effective immediately and remain in memory until the instance
is recycled.
If you want to know what the current FODC settings are for each member or partition in a system, you can
use the db2pdcfg -fodc -member all command (in the example, output is abridged and only the
FODC path output is shown):
Database Member 0
FODC package path (FODCPATH)= /home/hotel49/juntang/FODC/FODC0/
Database Member 1
FODC package path (FODCPATH)= /home/hotel49/juntang/FODC/FODC1/
Database Member 2
FODC package path (FODCPATH)= /home/hotel49/juntang/FODC/FODC2/
Full details about this ZRC value can be obtained using the db2diag command, for example:
ZRC class :
Critical Media Error (Class Index: 6)
Component:
SQLO ; oper system services (Component Index: 15)
Reason Code:
10 (0x000A)
Identifier:
SQLO_FNEX
SQLO_MOD_NOT_FOUND
Identifier (without component):
SQLZ_RC_FNEX
Description:
File not found.
Associated information:
Sqlcode -980
SQL0980C A disk error occurred. Subsequent SQL statements cannot be
processed.
The same information is returned if you issue the commands db2diag -rc -2045837302 or db2diag
-rc SQLO_FNEX.
An example of the output for an ECF return code is as follows:
ECF Set :
setecf (Set index : 1)
Product :
Db2 Common
Component:
OSSe
Identifier:
ECF_LIB_CANNOT_LOAD
Description:
Cannot load the specified library
The most valuable troubleshooting information in the db2diag command output is the description and
the associated information (for ZRC return codes only).
For a full listing of the ZRC values, use the db2diag -rc zrc command and for a full listing of the ECF
values, use the db2diag -rc ecf command.
Introduction to messages
Messages convey event information to users. A message describes the cause of an event and any actions
you can take in response to an event.
Note: The messages listed are specific for this product. For the full list of available messages, refer to the
Db2 knowledge center section on Introduction to messages.
Messages typically contain the following information:
• The nature of an event.
• The possible cause of an event.
• The possible action that you can take to resolve or avoid an event.
Message structure
When an event occurs, messages can be printed to diagnostic or notification logs, to a console, or on a
graphical interface. These messages are typically presented in a standardized structure. A message
contains the following sections:
• A unique identifier
• Short text
• An explanation section
• A user response
Unique message identifier
Identifiers consist of a three character message prefix, followed by a four or five-digit message
number, followed by a single letter suffix. For example, SQL1042C.
Message identifier prefix
Three characters that identify the category of a message. For example, the ADM message
identifier prefix identifies a message as an administration notification message.
Message number
Unique four or five-digit number that identifies the message.
Single letter suffix
A single character that indicates the type of event message.
C
Indicates a severe error message.
For messages with an SQL message identifier prefix, the C suffix indicates a critical error
message.
E
Indicates an urgent error message. The E suffix is for non-SQL messages.
N
Indicates an error message.
Explanation section
Some paragraphs that provide context and explain the event in more details.
User response
A paragraph that provides you with directions to handle the event that generated the message.
Operating systems
Every operating system has its own set of diagnostic files to keep track of activity and failures. The most
common (and usually most useful) is an error report or event log. Here is a list of how this information can
be collected:
• AIX: the error report logs are accessed using the /usr/bin/errpt -a command; the system logs are
enabled using the /etc/syslog.conf file
• Linux: the /var/log/messages* files or the /bin/dmesg command
• HP-UX: the /var/adm/syslog/syslog.log file or the /usr/bin/dmesg command
• Windows : the system, security, and application event log files and the windir\drwtsn32.log file
(where windir is the Windows install directory)
There are always more tracing and debug utilities for each operating system. See your operating system
documentation and support material to determine what further information is available.
Hardware
Hardware devices usually log information into operating system error logs. However, sometimes
additional information is required. In those cases, you must identify what hardware diagnostic files and
utilities might be available for piece of hardware in your environment. An example of such a case is when
a bad page, or a corruption of some type is reported by Db2. Usually this is reported due to a disk
problem, in which case the hardware diagnostics must be investigated. See your hardware
documentation and support material to determine what further information is available.
Some information, such as information from hardware logs, is time-sensitive. When an error occurs you
should make every effort to gather as much information as you can from the relevant sources as soon as
is possible.
In summary, to completely understand and evaluate a problem, you might have to collect all information
available from Db2, your applications, the operating system and underlying hardware. The db2support
tool automates the collection of most Db2 and operating system information that you will require, but you
should still be aware of any information outside of this that might help the investigation.
When a core file is generated, it can impose a significant processor usage on the system, which in turn can
affect system availability. If the performance impact to system availability during core file generation is
unacceptable, you can disable core file generation, but it is recommended that you do not disable it
permanently. Core files contain diagnostic information that can be required in order to troubleshoot a
problem successfully. If diagnostic information is not available because core file generation was turned
off permanently, troubleshooting a problem with your data server might become impossible. For example,
to turn core file generation off dynamically, effective immediately and until the instance is recycled, issue
the following command:
db2pdcfg DB2FODC="DUMPCORE=OFF"
program_name is the name of the program that terminated abnormally, and core_filename is the name
of the file containing the core file dump. The core_filename parameter is optional. If you do not specify
it, the default name "core" is used.
2. Examine the call stack in the core file. Information about how to do this can be obtained by issuing
man dbx from a UNIX command prompt
3. To end the dbx command, type quit at the dbx prompt.
Example
The following example shows how to use the dbx command to read the core file for a program called
"main".
1. At a command prompt, enter:
dbx main
3. The name of the function that caused the core dump is "freeSegments". Enter where at the dbx
prompt to display the program path to the point of failure.
(dbx) where
freeSegments(numSegs = 2, iSetId = 0x2ff7f730, pcAddress = 0x2ff7f758, line
136
in "main.c"
main (0x1, 2ff7f7d4), line 96 in "main.c"
In this example, the error occurred at line 136 of freeSegments, which was called from line 96 in
main.c.
4. To end the dbx command, type quit at the dbx prompt.
Procedure
• View the event logs using the Windows Event Viewer.
Procedure
• Export the event logs from the Windows event viewer.
• You can load the log-file format (*.evt) data back into an event viewer (for example, on another
workstation). This format is easy to work with since you can use the viewer to switch the
chronology order, filter for certain events, and advance forwards or backwards.
• You can open the text (*.txt) or comma-delimited (*.csv) format logs in most text editors. They also
avoid a potential problem with respect to timestamps. When you export event logs in .evt format,
the timestamps are in Coordinated Universal Time and get converted to the local time of the
workstation in the viewer. If you are not careful, you can overlook key events because of time zone
differences. Text files are also easier to search.
Procedure
• Locate the Dr. Watson log file.
The default path is <install_drive>:\Documents and Settings \All Users\Documents
\DrWatson
Trap files
Db2 generates a trap file if it cannot continue processing because of a trap, segmentation violation, or
exception.
All signals or exceptions received by Db2 are recorded in the trap file. The trap file contains the function
sequence that was running when the error occurred. This sequence is sometimes referred to as the
"function call stack" or "stack trace." The trap file also contains additional information about the state of
the process when the signal or exception was caught.
A trap file is generated when an application is forced to stop while running a fenced threadsafe routine.
The trap occurs as the process is shutting down. This is not a fatal error and it is nothing to be concerned
about.
The files are located in the directory specified by the diagpath database manager configuration
parameter.
On all platforms, the trap file name begins with a process identifier (PID), followed by a thread identifier
(TID), followed by the partition number (000 on single partition databases), and concluded with
".trap.txt".
Example
If a trap file called "DB30882416.TRP" had been produced in your directory specified by the diagpath
database manager configuration parameter, you could format it as follows:
Db2 documentation
Troubleshooting information can be found throughout the Db2 Information Center, as well as throughout
the PDF books that make up the Db2 library.
Getting fixes
A product fix might be available to resolve your problem. You can get fixes by following these steps.
Procedure
1. You can view fix lists and obtain fix packs from the following Web pages:
• IBM Support Portal: Downloads
• Fixes by version for DB2 for Linux, UNIX, and Windows
Procedure
Obtain the test fix from IBM Software Support and follow the instructions in the Readme file with respect
to installing, testing and removing (if necessary) the test fix.
When installing a test fix in a multi-partition database partition environment, the system must be offline
and all computers participating in the instance must be upgraded to the same test fix level.
Results
Procedure
Complete the following steps to contact IBM Software Support with a problem:
1. Define the problem, gather background information, and determine the severity of the problem. For
help, see the "Contacting IBM" in the Software Support Handbook: techsupport.services.ibm.com/
guides/beforecontacting.html
2. Gather diagnostic information.
3. Submit your problem to IBM Software Support in one of the following ways:
• Online: Click the ESR (Electronic Service Request) link on the IBM Software Support, Open Service
Request site: www.ibm.com/software/support/probsub.html
• By phone: For the phone number to call in your country/region, go to the Contacts page of the
Software Support Handbook: techsupport.services.ibm.com/guides/contacts.html
Note: If you require support for any IBM product that is packaged as part of the Db2 pureScale
software stack, then open a service request or problem management record (PMR) for the IBM Db2
pureScale Feature. Opening a PMR for the Db2 pureScale Feature helps resolve problems more
efficiently.
What to do next
If the problem you submit is for a software defect or for missing or inaccurate documentation, IBM
Software Support creates an Authorized Program Analysis Report (APAR). The APAR describes the
problem in detail. Whenever possible, IBM Software Support provides a workaround that you can
implement until the APAR is resolved and a fix is delivered. IBM publishes resolved APARs on the IBM
Software Support website daily, so that other users who experience the same problem can benefit from
the same resolution.
Procedure
• To submit files (via FTP) to the Enhanced Centralized Client Data Repository (EcuRep):
a) Package the data files that you collected into ZIP or TAR format, and name the package according
to your Problem Management Record (PMR) identifier.
Your file must use the following naming convention in order to be correctly associated with the
PMR: xxxxx.bbb.ccc.yyy.yyy, where xxxxx is the PMR number, bbb is the PMR's branch number, ccc
is the PMR's territory code, and yyy.yyy is the file name.
b) Using an FTP utility, connect to the server ftp.emea.ibm.com.
c) Log in as the userid "anonymous" and enter your email address as your password.
d) Go to the toibm directory. For example, cd toibm.
e) Go to one of the operating system-specific subdirectories. For example, the subdirectories include:
aix, linux, unix, or windows.
f) Change to binary mode. For example, enter bin at the command prompt.
g) Put your file on the server by using the put command. Use the following file naming convention to
name your file and put it on the server. Your PMR will be updated to list where the files are stored
using the format: xxxx.bbb.ccc.yyy.yyy. (xxx is the PMR number, bbb is the branch, ccc is the
territory code, and yyy.yyy is the description of the file type such as tar.Z or xyz.zip.) You can
send files to the FTP server, but you cannot update them. Any time that you must subsequently
change the file, you must create a new file name.
h) Enter the quit command.
• To submit files using the ESR tool:
a) Sign onto ESR.
b) On the Welcome page, enter your PMR number in the Enter a report number field, and click Go.
c) Scroll down to the Attach Relevant File field.
d) Click Browse to locate the log, trace, or other diagnostic file that you want to submit to IBM
Software Support.
e) Click Submit. Your file is transferred to IBM Software Support through FTP, and it is associated with
your PMR.
What to do next
For more information about the EcuRep service, see IBM EMEA Centralized Customer Data Store Service.
For more information about ESR, see Electronic Service Request (ESR) help.
Procedure
To download files from IBM Support:
cd fromibm
cd nameofdirectory
binary
4. Use the get command to download the file that your IBM technical-support representative specified.
get filename.extension
quit
Procedure
To subscribe to Support updates:
1. Subscribe to the Db2 RSS feeds by pointing your browser to the URL for the one of the RSS feeds and
clicking Subscribe Now.
2. Subscribe to My Notifications by going to the IBM Support Portal and click My Notifications in the
Notifications portlet.
3. Sign in using your IBM ID and password, and click Submit.
4. Identify what and how you want to receive updates.
a) Click the Subscribe tab.
b) Select the appropriate software brand or type of hardware.
Results
Until you modify your RSS feeds and My Notifications preferences, you receive notifications of updates
that you have requested. You can modify your preferences when needed.
A applications (continued)
performance (continued)
access plans lock management 183
column correlation for multiple predicates 390 modeling using catalog statistics 443
grouping 256 modeling using manually adjusted catalog statistics
indexes 442
scans 229 architecture
structure 63 overview 36
information capture by explain facility 271 asynchronous index cleanup 65
locks audit facility
granularity 183 troubleshooting 652
modes 187 authorized program analysis reports (APARs) 663
modes for standard tables 189 automatic maintenance
REFRESH TABLE statement 296 index reorganization in volatile tables 137
reusing 322 automatic memory tuning 91
SET INTEGRITY statement 296 automatic reorganization
sorting 256 details 135
statement concentrator 321 enabling 136
access request elements automatic statistics collection
ACCESS 376 enabling 415
IXAND 377 storage 415
IXOR 380 automatic summary tables 266
IXSCAN 381
LPREFETCH 381
TBSCAN 382
B
XANDOR 382 benchmarking
XISCAN 383 db2batch command 4
ACCRDB command 493 executing 5
ACCRDBRM command 493 overview 3
administration notification log preparing 3
details 629 sample report 6
first occurrence data capture (FODC) 641 SQL statements 3
interpreting 630 best practices
administrative task scheduler queries
troubleshooting 520 optimizing 156
agents writing 156
client connections 49 binding
managing 44 isolation levels 150
partitioned databases 50 block identifiers
worker agent types 43 preparing before table access 314
aggregate functions block-based buffer pools 107
db2expln command 315 blocking
AIX row 178
configuration best practices 51 buffer pools
alerts advantages of large 102
Db2 pureScale environments block-based 107
hosts 593, 598, 599 memory
members 593, 598 allocation at startup 102
hosts 603 multiple 102
application design overview 99
application performance 143 page-cleaning methods 103
application processes tuning
details 143 page cleaners 100
effect on locks 186
applications
performance C
application design 143
cardinality estimates
Index 669
cardinality estimates (continued) commands (continued)
statistical views 394 db2diag
catalog statistics analyzing db2diag log files 459
avoiding manual updates 444 db2drdat
catalog table descriptions 405 overview 492
collecting db2gov
distribution statistics on specific columns 435 starting Db2 Governor 12
guidelines 425 stopping Db2 Governor 23
index statistics 431 db2inspf
procedure 426 formatting inspection results 527
detailed index data 430 db2level
distribution statistics 433, 437 determining version and service level 464
index cluster ratio 235 db2look
manual adjustments for modeling 442 creating similar databases 464
manual update rules db2ls
column statistics 429 listing Db2 products and features 467
distribution statistics 440 db2pd
general 428 examples 468
index statistics 431 run by db2cos command 639
nickname statistics 430 db2pdcfg
table statistics 430 overview 641
modeling production databases 443 db2support
overview 402 collecting environment information 482
sub-elements in columns 427 example 636
user-defined functions 440 db2trc
classic table reorganization 120 obtaining trace 488
CLI EXCSAT 493
applications EXCSATRD 493
trace facility configuration 501 INSPECT
isolation levels 150 db2dart command comparison 457
trace files commit command 493
analyzing 505 commits
overview 499 lock releasing 143
traces 501 compilation key 363
cluster file systems compilation time
failures 593, 595 DB2_REDUCED_OPTIMIZATION registry variable 174
cluster manager domain dynamic queries
repairing 614 using parameter markers to reduce 173
cluster manager resource model compiler rewrites
repairing 613 adding implied predicates 218
clustering indexes correlated subqueries 215
partitioned tables 78 merge view 213
code pages compilers
best practices 51 capturing information using explain facility 271
collations compression
best practices 51 index
Collecting and reporting performance monitor data with performance effects 445, 551
db2mon 23 performance effects 445, 551
column-organized tables row
explain information 298 performance effects 445, 551
optimization guidelines and profiles 350 concurrency
space reclamation 134 federated databases 144
columns improving 152
distribution statistics 435 issues 144
group statistics 390 locks 183
statistics 429 configuration
sub-element statistics 427 best practices for settings 51
commands IOCP 112
ACCRDB 493 settings 51
ACCRDBRM 493 Configuration Advisor
db2cklog 454 performance tuning 56
db2dart configuration files
INSPECT command comparison 457 governor utility
overview 457 rule clauses 16
Index 671
Db2 pureScale Feature (continued) db2cluster command (continued)
GDPC automatic offlining of member 615
testing 573 repairing cluster manager domain 614
validating 573 troubleshooting options 613
installing db2cos script 639
troubleshooting 563, 565, 567, 575 db2dart command
instance creation 563 INSPECT command comparison 457
log files 562 troubleshooting overview 457
removing 616, 618 db2diag command
rollbacks 563 examples 459
trace files 562 db2diag logs
troubleshooting details 631
Db2 Setup wizard 563 first occurrence data capture (FODC) information 641
db2start command 579, 580 interpreting
diagnostics 557 informational record 635
GPFS 576 overview 632
manual data collection 562 using db2diag tool 459
overview 557 merging 625
post-installation 579, 591 db2drdat command
response file installations 579 output file 492
RSCT peer domains 578 db2drdat utility
tracing uDAPL over InfiniBand connections 585 traces 492
uDAPL communication errors 583 db2expln command
uninstalling 616, 618 information displayed
verifying uDAPL configurations for connectivity aggregation 315
issues 586, 588 block identifier preparation 314
Db2 pureScale instances data stream 314
hosts DELETE statement 314
troubleshooting 592 federated query 318
members INSERT statement 314
troubleshooting 592 join 312
troubleshooting miscellaneous 319
hosts 592 parallel processing 316
members 592 row identifier preparation 314
Db2 Setup wizard table access 305
Db2 pureScale Feature temporary table 310
launch problems 563 UPDATE statement 314
Db2 Text Search output description 305
logging 619 db2fodc command
maintaining 619 collecting diagnostic information 641
problem determination 619 db2gov command
tracing 619 details 12
troubleshooting 619, 622 starting Db2 Governor 12
DB2 Universal JDBC Driver stopping Db2 Governor 23
trace facility configuration 499 db2inspf command
DB2_EVALUNCOMMITTED registry variable troubleshooting 527
deferral of row locks 154 db2instance command
DB2_FMP_COMM_HEAPSZ variable host status 592
FMP memory set configuration 82 member status 592
DB2_REDUCED_OPTIMIZATION registry variable sample output 593, 607, 609
reducing compilation time 174 db2level command
DB2_SKIPINSERTED registry variable service-level identification 464
details 154 version-level identification 464
DB2_USE_ALTERNATE_PAGE_CLEANING registry variable db2licm command
proactive page cleaning 103 compliance report 532
db2batch command db2look command
overview 4 creating databases 464
db2cklog command db2ls command
troubleshooting 454 listing installed products and features 467
db2cli.ini file db2mon 23
tracing the CLI driver 501 db2mtrk command
db2cluster command sample output 99
automatic host reboot 615 db2pd command
automatic offlining of cluster caching facility 615 output collected by default db2cos script 639
Index 673
explain facility (continued) FODC (continued)
data operator information 276 dump files 640
db2exfmt command 297 platform-specific 657
db2expln command 297 subdirectories 646
explain instances 273 trap files 661
EXPLAIN statement 285 fragment elimination
explain tables 273 see data partition elimination 261
federated database queries 224, 227 free space control record (FSCR)
guidelines for using information 294 ITC tables 61
information organization 273 MDC tables 61
instance information 276 standard tables 59
output frequent-value distribution statistics 433
generated from sections in package cache 280
section actuals 292
overview 271, 297, 305
G
section explain 285 GDPC
tuning SQL statements 271 ***
EXPLAIN statement troubleshooting 577
comparison with EXPLAIN_FROM_SECTION 280 testing 573
explain tables validating 573
organization 273 general optimization guidelines 361
EXPLAIN_FROM_SECTION procedure global optimization
example 280 guidelines 361
explicit hierarchical locking global registry
performance 208 altering 463
See EHL 205 global variables
exports troubleshooting 525
displays 563 GPFS
expression-based indexes AIX 568
statistics error messages 603
automatic collection 424 Linux 571
manual collection 424 traces 590
overview 421 troubleshooting 576
RUNSTATS command 421 Updating 568, 571
statistics profiles 422 grouping effect on access plans 256
expressions
over columns 158
search conditions 157 H
EXTNAM object 493
hardware
configuration best practices 51
F hash joins
details 244
FCM hosts
memory requirements 87 Db2 pureScale environments
federated databases alerts 592, 598
concurrency control 144 states 592
determining where queries are evaluated 224 troubleshooting 592
global analysis of queries 227 HP-UX
global optimization 225 configuration best practices 51
pushdown analysis 221
server options 80
federated query information I
db2expln command 318
I/O
FETCH FIRST N ROWS ONLY clause
parallelism
using with OPTIMIZE FOR N ROWS clause 161
managing 110
first failure data capture (FFDC) trap files 660
prefetching 109
first occurrence data capture
I/O completion ports (IOCPs)
see FODC 641
configuring 112
fix packs
IBM
acquiring 662
submitting data to software support 665
overview 663
support updates 667
FODC
IBM Tivoli System Automation for Multiplatforms (SA MP)
data generation 651
recovery resource manager daemon failure 601
details 641
Index 675
K log buffers
improving DML performance 446
keys log files
compilation 363 checking validity 454
statement 363 log sequence numbers (LSNs)
gap 103
logging
L stand-alone Db2 Text Search server 619
large objects (LOBs) logical partitions
inline 446 multiple 45
licenses logs
compliance administering 629
report 532 archive logging 446
Linux circular logging 446
configuration best practices 51 governor utility 20
listing Db2 database products 467 statistics 416, 420
list prefetching 108
lock granularity M
factors affecting 186
overview 184 materialized query tables (MQTs)
lock modes restrictions 267
compatibility 188 maxappls configuration parameter
details 185 effect on memory use 80
IN (Intent None) 185 maxcoordagents configuration parameter 80
insert time clustering (ITC) tables MDC tables
RID index scans 192 block-level locking 183
table scans 192 deferred index cleanup 67
IS (Intent Share) 185 lock modes
IX (Intent Exclusive) 185 block index scans 197
multidimensional clustering (MDC) tables RID index scans 192
block index scans 197 table scans 192
RID index scans 192 management of tables and indexes 61
table scans 192 optimization strategies 259
NS (Scan Share) 185 rollout deletion 259
NW (Next Key Weak Exclusive) 185 members
S (Share) 185 alerts
SIX (Share with Intent Exclusive) 185 interpreting 592
U (Update) 185 states
X (Exclusive) 185 interpreting 592
Z (Super Exclusive) 185 troubleshooting
lock waits overview 592
overview 203 memory
resolving 203 allocating
locklist configuration parameter overview 80
lock granularity 183 parameters 87
locks See also memory sets 82
application performance 183 buffer pool allocation at startup 102
application type effect 186 configuring
concurrency control 183 See also memory sets 82
conversion 202 database manager 84
data-access plan effect 187 FCM buffer pool 87
deadlocks 204 partitioned database environments 98
deferral 154 self-tuning 88
granting simultaneously 188 troubleshooting
isolation levels 145 sort heap 545
lock count 185 memory sets
next-key locking 188 configuration parameters 82
objects 185 overview 82
overview 143 registry variables 82
partitioned tables 200 types 82
standard tables 189 memory tracker command
timeouts sample output 99
avoiding 152 merge joins
overview 203 details 244
Index 677
optimization guidelines (continued) partitioned database environments (continued)
XML schema (continued) replicated materialized query tables 248
query rewrite optimization guidelines 372 self-tuning memory 96, 98
optimization profile cache 387 partitioned tables
optimization profiles clustering indexes 78
binding to package 336 indexes 74
column-organized tables 350 locking 200
configuring data server to use 334 optimization strategies 261
creating 331 peer domains
deleting 337 manually cleaning up 617
details 329 performance
inexact matching analyzing changes 271
details 364 application design 143
inexact matching examples 366 Configuration Advisor 56
managing 388 db2batch command 4
matching 363 disk-storage factors 56
modifying 337 enhancements
overview 327 relational indexes 71
setting SQL compiler registry variables 331 evaluating 271
specifying for application 335 explain information 294
specifying for optimizer 335 guidelines 1, 546
SYSTOOLS.OPT_PROFILE table 387 isolation level effect 145
troubleshooting 552 limits 1, 546
XML schema 351 locks
OPTIMIZE FOR N ROWS clause 161 managing 183
optimizer overview 1, 546
statistical views queries
creating 393 explain information generated from section 280
overview 391 optimizing 209
tuning 327 section actuals 288
OPTPROFILE element 359 tuning 156
outer joins writing 156
unnecessary 160 RUNSTATS command
overflow records minimizing performance impact 444, 550
performance effect 129 system 7, 9
standard tables 59 troubleshooting 1, 546
Performance monitoring 23
phantom reads
P concurrency control 144
page cleaners isolation levels 145
tuning 100 physical database design
pages best practices 51
overview 59 points of consistency
parallelism database 143
db2expln command information 316 PRDID parameter 493
I/O precompilation
managing 110 specifying isolation level 150
I/O server configuration 109 predicate pushdown query optimization
intrapartition combined SQL/XQuery statements 216
optimization strategies 257 predicates
overview 180 avoiding redundant 170
non-SMP environments 180 characteristics 219
parameter markers implied
reducing compilation time for dynamic queries 173 example 218
parameters join
autonomic expressions 157
best practices 51 non-equality 159
memory allocation 87 local
PRDID 493 expressions over columns 158
partitioned database environments no-op expressions 159
best practices 51 query processing 238
decorrelation of queries 215 simple equality 391
join methods 251 translation by optimizer 212
join strategies 250 prefetching
Index 679
return codes (continued) shadow tables (continued)
internal 653 restrictions 267
REXX language SIX (Share with Intent Exclusive) lock mode 185
isolation levels 150 snapshot monitoring
rollbacks system performance 7
overview 143 sortheap database configuration parameter
rollout deletion troubleshooting 545
deferred cleanup 67 sorting
row blocking access plans 256
specifying 178 performance tuning 113, 548
row compression space reclaiming
performance effects 445, 551 scenario 138
row identifiers space reclamation
preparing before table access 314 column-organized tables 134
RSCT (Reliable Scalable Cluster Technology) SQL compiler
peer domains process 209
manually cleaning up 617 SQL statements
troubleshooting 578 benchmarking 3
RTS general request element 372 explain tool 305
RUNSTATS command isolation levels 150
expression-based indexes 421 performance improvements 241
RUNSTATS utility rewriting 212
automatic statistics collection 411, 415 tuning
improving performance 444, 550 efficient SELECT statements 175
information about sub-elements 428 explain facility 271
monitoring progress 442 restricting SELECT statements 176
sampling statistics 427 writing 157
statistics collected 402 SQLCA structure
SQLCODE field 492
trace utility 492
S SQLCODE
S (Share) lock mode trace utility 492
details 185 SQLJ
sampling isolation levels 150
data 179 SRVNAM object 493
sargable predicates statement concentrator
overview 219 details 321
scan sharing statement keys 363
overview 236 states
scenarios Db2 pureScale environments
access plans 296 ERROR 598
cardinality estimates 394 INACTIVE 593, 594, 597, 598
improving cardinality estimates 394 WAITING_FOR_FAILBACK 593, 594, 597–599, 603
scripts lock modes 185
troubleshooting 556 static queries
SECCHK command 493 setting optimization class 326
section actuals static SQL
explain facility output 292 isolation levels 150
SELECT statement statistical views
eliminating DISTINCT clauses 215 cardinality estimates 394
prioritizing output for 176 creating 393
self-tuning memory improving cardinality estimates 394
Db2 pureScale environment 93 optimization statistics 394
Db2 pureScale environments 92 overview 391
self-tuning memory manager reducing the number with referential integrity
See STMM 88 constraints 399
send buffer statistics from column group statistics 401
tracing data 492 statistics from expression columns 398
sequential prefetching 106 statistics
SET CURRENT QUERY OPTIMIZATION statement catalog
setting query optimization class 326 avoid manual updates 444
shadow paging details 402
changes to objects 446 collection
shadow tables automatic 411, 415
Index 681
troubleshooting (continued) troubleshooting (continued)
database operations 521 lock problems
databases lock escalations 542, 544
creation 554 lock timeouts 540, 541
in inconsistent state 527 lock waits 535, 537
Db2 pureScale overview 534
db2start command 579 log files 454
Db2 pureScale environment log record decompression 523
uDAPL connectivity 561 overview 453, 511, 623
Db2 pureScale Feature performance
/var file system 607 memory 545
Db2 Setup wizard 563 overview 513
db2start command 580 sort heap 545
diagnostics 557 redistribution of data 555
GDPC 577 resource-related problems 636
GPFS 576 resources 662
host reboot with restart light 610 searching for solutions to problems 661
installation 563, 575, 579 section actuals collection 556
instance creation 563, 575 SQL1035N message 521
kernel panic with IBM Spectrum Scale outage 612 storage keys 557
mapping Tivoli SA MP states to Db2 cluster services subscribing to support updates 667
states 566 sustained traps 516
overview 557 table states 555
post-installation 579, 591 tasks 520
response file installation 579 tools 454
rollback 563, 575 traces
RSCT peer domains 578 JDBC applications 499
tracing uDAPL over InfiniBand connections 585 See traces 487
uDAPL communication errors 583 tuning
uninstallation 616, 618 guidelines 1, 546
verifying uDAPL configurations for connectivity limitations 1, 546
issues 586, 588 queries 156
Db2 pureScale instances sorts 113, 548
hosts 592 SQL with explain facility 271
members 592 tuning partition
db2diag log file entry interpretation 632 determining 98
db2fodc -clp command 531
db2pd -load command 517
db2pd -utilities command 517
U
deadlock problems U (Update) lock mode 185
diagnosing 538 uDAPL
resolving 539 failure 602
diagnostic data UDFs
alt_diagpath configuration parameter 625 statistics for 440
automatic collection 641 uncommitted data
collecting base set 512 concurrency control 144
configuring collection 644 uncommitted read (UR) isolation level
DAS 513 details 145
data movement 512 uninstallation
diagpath configuration parameter 625 Db2 pureScale Feature 616
directory path 625, 626 units of work
installation 529 overview 143
instance management 513 UNIX
manual collection 641 listing Db2 database products 467
diagnostic logs 631 updates
disk storage space for temporary tables 523 data
FCM problems 554 performance 104
gathering information 464, 468, 482, 512, 665 lost 144
getting fixes 662 user-maintained MQTs
host validation 573 query optimization 269
installation problems 529, 531, 532 utilities
instance creation 531 db2drdat 492
internal return codes 653 ps (process status) 493
load operations 517 trace 492
X
X (Exclusive) lock mode 185
XML data
partitioned indexes 74
XML schemas
ACCESS access request element 376
access request elements 375
accessRequest group 374
computationalPartitionGroupOptimizationChoices group
360
current optimization profile 351
DEGREE general request element 370
DPFXMLMOVEMENT general request element 370
general optimization guidelines 369
generalRequest group 369
global OPTGUIDELINES element 359
HSJOIN join request element 385
INLIST2JOIN query rewrite request element 373
IXAND access request element 377
IXOR access request element 380
IXSCAN access request element 381
JOIN join request element 385
join request elements 384
joinRequest group 383
LPREFETCH access request element 381
MQTOptimizationChoices group 360
MSJOIN join request element 386
NLJOIN join request element 386
NOTEX2AJ query rewrite request element 373
NOTIN2AJ query rewrite request element 373
OPTGUIDELINES element 368
OPTPROFILE element 359
plan optimization guidelines 374
QRYOPT general request element 371
query rewrite optimization guidelines 372
REGISTRY general request element 372
REOPT general request element 371
rewriteRequest group 372
RTS general request element 372
STMTKEY element 362
STMTMATCH element 362
STMTPROFILE element 361
SUBQ2JOIN query rewrite request element 374
TBSCAN access request element 382
XANDOR access request element 382
XISCAN access request element 383
XQuery compiler
process 209
Index 683
684 IBM Db2 V11.5: Performance Tuning
IBM®