DB2 DBA Checklist
DB2 DBA Checklist
DB2 DBA Checklist
Checklist
Skill Level: Introductory
19 Apr 2004
Just like a high performance sports car, a database requires some checks to keep it
running optimally. This article is broken down into tasks or checks that can be run at
different intervals on your DB2® for Linux®, UNIX®, and Windows® database, to do
just that. Learn when to monitor and what you should be doing daily, weekly, and
monthly. Updated for DB2 9.
Introduction
While databases are becoming more and more self-aware and self-healing, they still
require some monitoring to keep them running as efficiently as possible. Just like
your car, a database requires some checks to keep it running optimally. This
document is broken down into tasks or checks that you can run at different time
intervals to ensure that your DB2 databases are running optimally, and detect
potential issues before they happen.
The first set of checks or tasks should be run every day to make sure there are no
current or imminent problems. The second set should be run weekly to check for
issues or problems that may have occurred during the week or are likely to occur in
the coming week. The final set of checks or tasks need not be run every day or
week, but should be run monthly to keep the system running without problems, and
to prevent further issues in the event that a problem does occur.
When capturing the information for analysis, make sure that the DB2 and operating
system information is captured at the same time, as you cannot correlate information
captured at different times.
When monitoring the system, take the snapshots over a period of time. Taking them
for only a one or two minute period will not give a real view of the system activity. I
suggest that you take the snapshot every one to 5 minutes, for a period of at least
one hour.
For example, to capture the CPU, memory and other operating system usage
information on UNIX or Linux we use the tool vmstat. To capture the
Parameter 1 The interval, in seconds, at which the tool captures the system
information
Parameter 2 The number of times that the tool should capture the system
information
To run vmstat and iostat and capture a snapshot every 5 minutes (300 seconds) for
8 hours (28,800 seconds) you can run the following commands:
NOTE: the -tx option on iostat is not supported on all UNIX/Linux versions, but is
useful since it embeds the timestamp for when the snapshot was taken.
Also make sure to capture the snapshots are normal/average workload times as well
as peak workload times. While it is important to ensure the normal workloads are
handled efficiently, it is also important to ensure that the system can handle the peak
workloads without overloading the server.
Windows Tools
On Windows, you can look at the CPU usage and memory usage in the Task
Manager as seen below, but you cannot capture this information into a file like you
can with vmstat and iostat:
DB2 tools
DB2 has a number of tools that can be used to monitor the activity of the databases
and instances. These include:
In Version 8, DB2 introduced two new features to help you monitor the health of your
DB2 systems: the Health Monitor and the Health Center. These tools add a
management by exception capability to DB2 9 by alerting you to potential system
health issues. This enables you to address health issues before they become real
problems that affect your system's performance.
The Health Monitor runs on the DB2 server and continually monitors the health of
the DB2 instance and databases. If the Health Monitor detects that a user-defined
threshold has been exceeded (for example, the available log space has dropped
below a set percentage of the total space available), or if it detects an abnormal
state for an object (for example, the DB2 instance is no longer running), the Health
Monitor will raise an alert.
an alert is issued.
The Health Center provides the graphical interface to the Health Monitor. You use it
to configure the Health Monitor, and to see the rolled up alert state of your instances
and database objects. Using the Health Center's drill-down capability, you can
access details about current alerts and obtain a list of recommended actions that
describe how to resolve the alert. You can also choose to follow a recommended
action right inside the tool. The Health Center is easily configured to show status line
health beacons and/or pop up a dialog box telling you that the Health Center has an
object in alert status.
DB2 maintains data about its operation, its performance, and the applications that
are accessing it. This data is maintained as the database manager runs, and can
provide important performance and troubleshooting information. For example, you
can find out:
To set the monitor switches within a session, use the UPDATE MONITOR
SWITCHES command or the sqlmon() API.
For example, to enable buffer pool, lock, and dynamic SQL statement monitoring,
turn on the monitor switches using the following command:
You can access the data that the database manager maintains either by taking a
snapshot or by using an event monitor. You can take a snapshot in one of the
following ways:
Once an event monitor has been created and activated, it will collect information
about the database and any database applications when the specified event occurs.
An event is a change in database activity which can be caused by one of the
following:
Event monitors are created using the CREATE EVENT MONITOR statement and
will collect event information only when they are active. An event monitor is activated
and deactivated using the SET EVENT MONITOR STATE statement. The
EVENT_MON_STATE function will return the current state of the specified event
monitor.
When the CREATE EVENT MONITOR statement is executed, the definition of the
event monitor is created and stored in the system catalog tables.
Database tools / snapshots by themselves typically do not give you the complete
picture of your system performance. For example, a database may be 100%
optimally tuned, but will not be able to perform well if I/O contention is occurring on
the server. Therefore it is important to look at the complete picture to make sure the
entire system is performing well.
Daily procedures
Verify that all instances are up and running
2. export/set DB2INSTANCE=instancename
• and run db2start.
5. On Windows, check that the service for each DB2 instance is started.
The attach method can be easily scripted as long as all of the instances (that is,
NODEs) are cataloged on your workstation.
To use the ps command on UNIX and Linux you first need to telnet into each of the
servers.
The definition of consistent can be confusing, and how it is reported by the GET DB
CFG command often causes questions.
A good method, because it will also make inconsistent databases consistent and
therefore reduce the times for future connect requests, is to successfully connect to
all databases. This also can be easily scripted, as long as all of the databases are
cataloged on your workstation.
It is important to make sure that there were no problems that occurred overnight.
Since Version 8, DB2 has been writing diagnostic messages to two places, the
Administration Notification Log (where the messages intended for DBAs are written),
and the DB2DIAG.LOG file (where the messages intended for the DB2 service team
are written).
On Linux and UNIX, the log is written to a file named <instance_ID>.nfy that is
located in the directory specified by the DIAGPATH instance level configuration
parameter. To view the notification log you can:
There is nothing worse than having a problem on your system, and deciding to
restore the most recent backup, and then finding that the backup was not taken or is
not complete. Therefore, it is important to check that the previous night's backup(s)
were successful and that they have been stored in a safe location.
The first step is to ensure that the backups were successful. This is done using the
List History command as follows:
This can be scripted so that it is run for all databases after the backups complete,
and the report emailed to you. You can then simply verify the report each morning.
In the event that the whole server goes down for a sustained period of time, you may
need to revert to your disaster recovery plan, restore the database to another server,
maybe in another location. Therefore it is important that the backup images be
stored in a safe site, not only on the server where the backup is taken. This can be
easily accomplished by copying the backup image to a LAN drive, an NFS mounted
drive or to a tape device.
If your database is read only, or can be rebuilt from scratch easily, you likely do not
have log retain enabled so you can skip this step. However, for those transactional
databases where you can not afford to lose any committed transactions, it is
important to make sure log retain is enabled, and that the logs are being archived
successfully so that the database can be rebuilt and the transactions replayed in the
event of a disaster.
While recovery may be the primary reason for verifying the logs are being archived
successfully, there is another important reason. If the logs are not archived, they will
remain in the LOGPATH. Since the LOGPATH is normally on a file system with a set
size, if the log files are not being archived, as new logs are created the file system
will be filled up. When this occurs, DB2 will be unable to create any more log files
and will therefore stop.
When a userexit is called to archive a log file, it will write information to two places.
The first place will be the userexit audit log where an entry will be written for every
archive log request received by the userexit. In the event of an error during the
userexit processing, a message will be written to the userexit error log file as well.
These log files are in the LOGPATH and are named ARCHIVE.LOG and
USEREXIT.ERR respectively.
To examine these logs you can easily write a script to grab the last 50 to 100 lines
from these files (using the tail command) for all instances and email them to you.
Then you can study them along with the recovery history information each morning.
Whenever you work with other DBAs there is always a chance that someone else
might have changed one or more database or database manager configuration
parameters and forgot to tell everyone else. Since changes to the database and
database manager configurations have a big impact on the system, veryify that they
have not changed unexpectedly. Capture the DB and DBM configurations into a file
as follows:
Make sure that you capture the output of the commands to a file, and name the file
so that is has the date as part of the name, like DB_DBM_CFG.07152006.out. Then
you can use a tool like diff to compare the current output with the previous day's
output as follows:
This way, if there is a change you will see something like the following:
While the buffer pool hit ratio is very important for OLTP systems, it is impossible to
have a high buffer pool hit ratio in a database warehouse, so examine the measures
that are important for your workload. Some of the important performance measures,
and how they can be calculated are shown below.
Calculate the data, index and combined buffer pool hit ratios as well as the
asynchronous read percentage using the following statement:
Note: The nullif function is used in the query above to return a null when the number
inside the bracket (i.e. pool_data_l_reads or pool_index_l_reads) is zero (0),
otherwise the calculation would cause a divide by zero error and the statement will
fail.
Examine the usage patterns for the tables in your database using the query below.
This query examine how many rows were read, written, and the number of overflow
records accessed using the following statement.
select
substr(table_schema,1,8) as Schema,
substr(table_name,1,30) as Table_Name,
rows_read,
rows_written,
overflow_accesses
from table (snapshot_table ('sample', -1) ) as snapshot_table;
Examine the overall database usage patterns using the query below. This query
examines:
• How many rows were read vs. selected
• How many lock waits occurred, the total lock wait time and the average
lock wait time
• How many deadlocks and lock escalations were detected
• How many sorts occurred, the total sort time, and the average sort time,
the percent of sorts that overflowed
select
db_name,
SNAPSHOT_TIMESTAMP,
rows_read,
rows_selected,
lock_waits,
lock_wait_time,
lock_wait_time/nullif(lock_waits,0) as avg_wt_time,
deadlocks,
lock_escals,
total_sorts,
total_sort_time,
total_sort_time/nullif(total_sorts,0) as avg_sort_time,
sort_overflows,
sort_overflows/nullif(total_sorts,0) as pct_ovflow_sorts
from table (snapshot_database (' ', -1) ) as snapshot_database;
With the growing ability for DB2 to adapt to changes in performance and usage
automatically, much of the everyday administration is no longer required, but you still
might want to look at, and understand, the changes DB2 has made to the database
configuration parameters as well as to the underlying table space allocation.
You can track the changes to the table space allocation using the list tablespaces
show detail command. The changes made by the automatic self tuning memory are
logged in files named stmm.#.log in the stmmlog directory. This directory is under
the SQLLIB directory for the instance owner in Linux and UNIX, under the
SQLLIB\Instance directory on Windows.
Another important thing to look at on the server is the memory usage of DB2 and the
entire server. On Windows you can determine the amount of RAM on the server by
opening My Computer, and then selecting Help and About Windows.
In UNIX and Linux the command free will display the amount of real memory (RAM)
on the system, and how much is currently in use and available. In the case below
there is 1GB of real memory on this server, and approximately 717MB is allocated to
applications.
Study DB2
Nothing is more valuable in the long run than that a DBA who is widely experienced,
and as widely read as possible. This study should include DBA manuals, magazines,
newsgroups and mailing lists.
The comp.databases.ibm-db2 news group is a great place to learn from, and share
information with, your fellow DBAs.
For more detailed information you should also look for our DB2 Certification Guide
series, as these books are very informative.
Weekly procedures
Look for new objects
It is important to know if people are creating new tables, indexes, stored procedures,
etc. in your production database. New objects typically indicate that a new
application has been installed on the server and any new applications and/or objects
will impact the operational characteristics of your system.
In addition, new objects will consume space within the database, so it is important to
identify these objects before they grow too large and could potentially fill a table
space. If these objects are not created by a DBA, they very likely may have been
created in the wrong table space, which can cause space and/or performance
issues.
There are a few alternatives available to check for any new objects within the
system:
For any differences, you determine the CREATOR of the object from the catalog
table and track the information back to the person that created the object.
Once you have optimized your database based on your current workload, there is
nothing more frustrating than getting a call that the database is not performing well,
and finding that the poor performance was caused by a new application, or changes
to existing applications that no one told you about. Unfortunately, this happens all
too often. By monitoring your database for new and/or changed applications you can
hopefully detect these changes before they cause performance problems.
To look for new applications you can use the list applications show detail
command. If you redirect the output of this command to a file and keep these files for
a period of time, you can compare the files every week to see if a new application
name suddenly appears in the output.
To look for changes applications you can see what SQL is running on your system
over time, and look for new SQL that has not been run previously. To do this you
can create a table as follows:
You can then retrieve the SQL statements from the current package cache and
insert them into a table for analysis using the following statement:
You can then examine this table for any SQL statements that have not been
executed previously using the statement:
In the output of this statement, any statement with a count of 1, and the timestamp
column showing the current date is one that has not been run previously.
As you insert, update and delete rows in your tables, the data in the tables may need
to be REORGed to:
The reorgchk tool will check your tables and indicate which tables may need to be
reorged. You can run the reorgchk tool against a single table, all user tables, all
tables in a specific schema, or all system catalog tables. You can also indicate
whether the tool should use the current statistics in the catalog tables as a basis, or
gather new statistics first.
To run the reorgchk tool against all of your tables, and ensure you are using the
current statistics, use the command:
You should redirect the output of this command to a file for further analysis.
When viewing the output of the reorgchk tool, find the F1, F2 and F3 columns for
your tables, and the F4, F5, F6, F7, and F8 columns for your indexes. If there is an
asterisk (*) in any one of these columns, that indicates that DB2 has calculated that
your current table and/or indices currently breach that threshold.
It is important to note that for tables, if you see an asterisk in any of the columns,
then you typically need to reorg the table. However, since many tables have more
than one index, by definition if one of them is 100% clustered, the other indices will
not be clustered. Therefore you need to investigate the index portion of the
reorgchk output in more detail and consider all of the indexes on the table when
determining whether or not to reorg the index.
F1: the percentage of rows that are overflow records. When this is greater than 5%
there will be an asterisk (*) in the F1 column of the output.
F2: the percentage of used space on the data pages. When this is less than 70%
there will be an asterisk (*) in the F2 column of the output.
F3: the percentage of pages that contain data that contain some records. When this
is less than 80% there will be an asterisk (*) in the F3 column of the output.
F4: the cluster ratio, i.e. the percentage of rows in the table that are in the same
order as the index. When this is less than 80% there will be an asterisk (*) in the F4
column of the output.
F5: the percentage of space that is used on each index page used for index keys.
When this is less than 50% there will be an asterisk (*) in the F6 column of the
output.
F6: the number of keys that can be stored on each index level. When this is less
than 100 there will be an asterisk (*) in the F6 column of the output.
F7: the percentage of record IDs (keys) on a page that have been marked as
deleted. When this is more than 20% there will be an asterisk (*) in the F7 column of
the output.
F8: the percentage of empty leaf pages in the index. When this is more than 20%
there will be an asterisk (*) in the F8 column of the output.
When reorganizing a table you can optionally specify which on which index DB2
should cluster the data. To reorg the ORG table based on the ORGX index, use the
command
The DB2 optimizer uses database statistics to determine the optimal access plans
for your SQL statements. When you make significant changes to the amount of data,
or to the data organization in your tables you should use the runstats tool to capture
new statistics and store them in the system catalogs. You should also be sure to
capture statistics for any new table or index.
To capture statistics for the ORG table, and its indexes you can use the command
NOTE: You must specify the schema for the table when using the runstats
command.
You can check for any tables or indexes without statistics, or with statistics that are
over 7 days old using the following statements:
select substr(name,1,30),substr(creator,1,10),stats_time
from sysibm.systables
where stats_time < ((current timestamp) - 7 days)
or stats_time is null
select substr(name,1,30),substr(creator,1,10),stats_time
from sysibm.sysindexes
where stats_time < ((current timestamp) - 7 days)
or stats_time is null
When considering which tables to run reorg or runstats on you should also
consider the activity on the tables. To find the 10 most read tables, based on the
number of rows read, use the following statement:
To find the 10 most updated tables, based on the number of rows written, use the
following statement:
rows_written,
overflow_accesses,
page_reorgs
from table (SNAPSHOT_TABLE(' ',-1)) as snapshot_table
order by rows_written desc
fetch first 10 rows only
These tables are also likely candidates for at least a runstats, if not a reorg and a
runstats.
It is good practice to clean up the diagnostic logs on a regular basis. In the event an
error does occur, you then do not need to go over 6 months of information in the
logs, and the logs are a lot smaller and easier to edit. Before purging the files, make
a copy of them in case you want to go back at some time in the future to investigate
what was occurring on the system at a given time.
On Windows you can save the event log to another file in the Event Viewer by
selecting the Action menu, and then choosing the Save Log File As option. You
can then purge the entries from the log by selecting the Action menu, and then
choosing the Clear All Events option.
NOTE: It is good practice to name the file with the current date to make it easier to
look back at the files at a later date.
For the DB2DIAG.LOG file as well as the administration notification log file on Linux
and UNIX, you should compress these files, and name then with the current date in
the file name as well.
On Linux or UNIX, you can tar the *.nfy and db2diag.log files together, and then use
either gzip or compress to reduce the size of the resulting file.
It is always good to know if there are any updates to the software that you are
running. If your system is running smoothly, you may not want to apply all service to
your server. By reading the information about the fixes contained in the fixpak /
service packs, you can make a more educated decision about whether or not to
apply the fixpack. If you are encountering issues, you can look at the fix descriptions
to see if one of the available fixes might be the solution to your problems.
From a DB2 perspective, the most important web site is the DB2 for Linux, UNIX,
and Windows Technical Support Page:
http://www-3.ibm.com/cgi-bin/db2www/data/db2/udb/winos2unix/support/download.d2w/WINV8FP
One way to be sure you find out when a new fixpak becomes available is to
subscribe to DB2 Alerts at this site:
http://www-3.ibm.com/cgi-bin/db2www/data/db2/udb/winos2unix/support/db2alert.d2w/report
Monthly procedures
Look for indicators of exceptional growth
Review your tables and table spaces to see how much they have grown in the past
month. By knowing how fast the tables and table spaces are growing, and how much
space is still available, you can detect potential space issues before they happen.
You can retrieve the size of the table space and the amount of space available using
the statement below.
You can see how big each of your tables is by looking at the system catalog tables.
As long as your statistics are current, this information will be accurate. To get the
size of your tables use the statement
select tabname,
npages
from syscat.tables
where tabname not like 'SYS%'
NOTE: If statistics have not been captured for a table, it will have a value of -1 for
npages.
Create a history table or a spreadsheet to store this information so that you can
scrutinize the space usage for your tables and table spaces over time. An easy way
to do this is to create an export statement using the select statements above, and
create a delimted ASCII (DEL) file which you can then import directly into a
spreadsheet.
Compare the information you have been gathering on the system level CPU,
memory, network, and disk utilization, as well as the DB2 object information that you
have been gathering, to identify trends that could lead to contention or a shortage of
Based on your analysis of the above information, you can plan for these situations
before they happen and take actions to prevent these situations from occurring.
The following appendices contain useful scripts that can be used to monitor your
system and database. Note that these scripts were written in files that were run
using the CLP, and therefore contain comments. The comments are preceded by
the double dashes (--) and need to be removed if you are running these commands
directly on the command line.
-- Insert the snapshot info into the tablespaceinfo table to be stored for analysis.
-- Display the table space type, i.e. DMS or SMS as a string, not the numeric
value in the info.
when tablespace_type = 0 then 'DMS'
when tablespace_type = 1 then 'SMS'
-- Only 0 and 1 are VALID, therefore return an error for anything else.
else 'Error'
end) as Managed_By,
(case
-- Display the type of data that can stored in the table space, i.e. TEMP,
LARGE/LOB OR ALL,
not the numeric value in the info.
when tbs_contents_type = 2 then 'TEMP'
when tbs_contents_type = 1 then 'LARGE'
when tbs_contents_type = 0 then 'ALL' end) as Data_Type,
-- Also return the total_pages using the heading ALLOCATED PAGES,
total_pages as allocated_pages,
usable_pages,
used_pages,
free_pages,
page_size
from table (snapshot_tbs_cfg ('sample', -1) ) as snapshot_tbs_cfg
order by pct_free;
select tablespace_name,
date(timestmp) as dte,
pct_free
from tablespaceinfo
group by tablespace_name, pct_free, timestmp ;
select
substr(tablespace_name,1,12) as TBSPC_Name,
substr(Container_name,1,67) as Cont_Name,
(case
when container_type = 0 then 'SMS Directory'
when container_type = 6 then 'DMS File'
else 'DMS Device'
end) as Container_Type,
usable_pages
from table (snapshot_container (' ', -1) ) as snapshot_container;
-- assigned to the bufferpool. This can be used to help size the bufferpools more
appropriately.-
-- Group by bpname first to get all table spaces for a bufferpool together as the table
spaces
-- will be unique in this report.
Resources
Learn
• Visit developerWorks DBA Central to sharpen your skills on installation,
migration, administration, problem determination, monitoring, availability,
security, and performance.
• Visit the developerWorks resource page for DB2 for Linux, UNIX, and Windows
to read articles and tutorials and connect to other resources to expand your
DB2 skills.
• Learn about DB2 Express-C, the no-charge version of DB2 Express Edition for
the community.
• Stay current with developerWorks technical events and Webcasts.
Get products and technologies
• Download a free trial version of DB2 Enterprise 9.
• Now you can use DB2 for free. Download DB2 Express-C, a no-charge version
of DB2 Express Edition for the community that offers the same core data
features as DB2 Express Edtion and provides a solid base to build and deploy
applications.
Discuss
• Participate in the discussion forum for this content.
• Participate in developerWorks blogs and get involved in the developerWorks
community.