SQL Basicsof Indexes
SQL Basicsof Indexes
Join Simple-Talk
Sign in
Searc
SQL Server Indexes need to be effective. It is wrong to have too few or too many. The ones you create
must ensure that the workload reads the data quickly with a minimum of I/O. As well as a sound
knowledge of the way that relational databases work, it helps to be familiar with the Dynamic
Management Objects that are there to assist with your indexing strategy. In this extract from their book
Performance Tuning with SQL Server DMVs, Tim Ford and Louis Davidson cover the basics.
Defining an effective indexing strategy is the only way to ensure that the most significant and frequent queries in your
workload are able to read only the required data, and in a logical, ordered fashion, thus returning that data quickly and
efficiently, with minimal I/O. However, finding the correct balance between too many and too few indexes, and having the
"proper" set of indexes in place, is a delicate art. It requires sound knowledge of the database design, how the data
within the tables is distributed, and of the typical query patterns.
This is why the indexing-related set of Dynamic Management Objects (DMOs) is probably the most widely used of any
category. The indexing DMOs, all of which have names starting with sys.dm_db_, can help the DBA answer such
questions as the following (some of the relevant DMOs are indicated in brackets).
Are there any indexes that are no longer in use, or have never been used? (index_usage_stats)
For indexes that are in use, what is the usage pattern? (index_operational_stats)
Which indexes are missing? (missing_index_details, missing_index_group_stats)
In this article, we'll describe, by example, how to answer these questions using the DMOs.
1/20
11/7/2014
no primary key is defined, but just about all columns have non-clustered indexes defined on them.
In short, it's one heap of a mess, but we can't just leap in and remove indexes that our "gut instinct" tells us are not
required. In SQL Server 2005 and later, via the indexing DMOs covered in this article, we DBAs now have proper insight
into the indexes that are used and those that the optimizer is ignoring. This removes the "gut feel" factor from the process
of cleaning up incorrect, unused, and downright ignorant indexes.
However, before we start examining the scripts that we can use to uncover this information, it's worth stating up front that
blindly following the advice offered by these DMOs is not the right way to go, either. As noted earlier, defining an effective
indexing strategy is a delicate art and one that requires sound knowledge of your database design, how the data within
the tables is distributed, and how that data is queried, typically. It is beyond the scope of this article to provide a full
tutorial on how to determine an effective set of indexes, but having covered some of the things we don't like to see, it's
worth taking just a brief look at some of the things we do like.
Covering indexes
A covering index is one that contains all of the columns and data required by a query. This means that any columns used
in join or search conditions are included in the index, along with any columns that are simply selected. The latter should
be included, as INCLUDE columns, rather than as part of the actual index. If an index covers a query, it means that the
optimizer can return the data entirely from the index, without the need to perform a dreaded table scan, or "key lookup," to
get any non-covered data from the clustered index. This results in fewer reads, and is usually the quickest, most efficient
way to return the data. The "usually" qualification is there because, even if an index exists that that you think a query
should be using, there is no guarantee that the optimizer will choose to use it.
High selectivity
If you've chosen a low selectivity column for the index key (i.e. where each key value matches many rows), then the
optimizer may decide to perform a table scan to return a piece of data. Table scans have a bad reputation, but this is
because it often means reading a huge number of rows; in small tables, scanning all the rows is sometimes quicker than
reading the data from the leaf levels of an index.
You're looking for selective columns to form your index key and, certainly, the leading (first) column should be selective.
However, this does not mean each index should start with the PK column; it must be a column that is likely to get
searched on. You can find a good discussion of index selectivity and column ordering here:
http://sqlserverpedia.com/wiki/Index_Selectivity_and_Column_Order.
2/20
11/7/2014
Neither does the drive to cover queries mean that you should create huge, 16-column indexes in an attempt to "cover
everything at once;" if your index key values are wide, you'll fit few on a page, your index will take up a lot of space, and
scanning it will be inefficient. Searching on narrow index keys is much quicker.
Again, though, it is a balancing act; having a huge number of single column indexes is a bad idea, too. Your goal is to
make your indexes as narrow as possible while being usable by as many queries as possible. For example, if users
search on employees' last names, an index on the LastName column is probably a good idea. If users also sometimes
qualify the search with first names, then create a single index on (LastName, Firstname) as this will satisfy both queries.
3/20
11/7/2014
every database on the instance, but you will almost always want to limit it per database, using the database_id to
retrieve the index names for that database, via sys.indexes (as shown in Listing 1). Note also that the DMV does not
distinguish between partitions, so if an index is physically manifested in two or more partitions, the DMV only returns a
single record.
Listing 1 provides a listing of indexes for the database that have been used at least once during a query execution, with
those indexes that have been scanned the most listed first. A high number of scans may indicate a need to update your
statistics for a given table or index. However, equally, a high number of scans will result if the query optimizer decides
that the table is small enough that it is quicker to scan the index rather than perform a seek operation. Hence, the output
of this query should not be considered in isolation, but rather in conjunction with data regarding the selectivity and the
size of the index (which can be returned via a query against sys.dm_db_index_physical_stats, covered later in the
article).
SELECT
You will see that, in this query and all the ones that follow, we use the following formula to calculate the total number of
times that the index is used by the optimizer to resolve a user query:
[user_seeks] + [user_scans] + [user_lookups] = [user reads]
The user_updates column on its own provides the total number of times the index has been updated as a result of data
modifications (writes). From a performance tuning perspective, this DMV is invaluable as it shows exactly how the
indexes are being used and, critically, it tells us something that no previous version of SQL Server did: which indexes are
not being used or, more pertinently, not being used but being updated frequently. A similar calculation can be used to get
the total system reads of an index. However, we'll ignore any system activity from this point forward as it is almost always
negligible in comparison to user-driven activity.
Over the coming sections, we'll present scripts to:
find indexes on your system that have never been read or written
find indexes that have never been read but are being maintained (i.e. updated in response to modification of the
underlying table data)
get detailed read/write stats on all indexes, looking for those where the maintenance burden may outweigh their
https://www.simple-talk.com/sql/performance/tune-your-indexing-strategy-with-sql-server-dmvs/
4/20
11/7/2014
If SQL Server has been running long enough for you to have a complete, representative workload, there is a good
chance that those indexes (and perhaps tables) are "dead," meaning they are no longer used by your database and can
potentially be dropped, after some further investigation.
FROM
WHERE
https://www.simple-talk.com/sql/performance/tune-your-indexing-strategy-with-sql-server-dmvs/
5/20
11/7/2014
GROUP BY su.[name] ,
o.[name] ,
i.[name] ,
ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] ,
ddius.[user_updates]
HAVING ddius.[user_seeks] + ddius.[user_scans] + ddius.[user_lookups] = 0
ORDER BY ddius.[user_updates] DESC ,
su.[name] ,
o.[name] ,
i.[name ]
Listing 3: Querying sys.dm_db_index_usage_stats for indexes that are being maintained but not used.
I ran this query recently in my production environment against a database supplied and administered by a third party; I
knew I would see some scary things, but was amazed when it returned over 120 indexes that had not been read. It is
possible, at the same time as listing these high write / zero read indexes, to generate the commands to drop them, simply
by inserting the following at the end of the SELECT clause:
'DROP INDEX [' + i.[name] + '] ON [' + su.[name] + '].[' + o.[name]
+ '] WITH ( ONLINE = OFF )' AS [drop_command]
Having verified the need to drop an index from the database, simply copy the DROP INDEX command text from the result
set into a new query window and execute it. As always, we advocate testing such processes in your development
environment first, before running against a production database. Furthermore, it is recommended you take a backup of
the database before running such a command.
As noted earlier, I would not like to encourage readers to go around wildly dropping large numbers of indexes without
proper investigation. For a start, it is always advisable to check how recently the usage stats were cleared, by querying
sys.sysdatabases, as shown in Listing 4.
SELECT
https://www.simple-talk.com/sql/performance/tune-your-indexing-strategy-with-sql-server-dmvs/
6/20
11/7/2014
FROM
WHERE
sys.sysdatabases sd
sd.[name] = 'tempdb' ;
Also, an index may not have been used recently simply because its functionality is cyclical in nature (perhaps only used
in a month-end process), or simply because it is a recently-implemented index. Once again, it is important not to drop or
create indexes, without first performing adequate testing in a non-production environment.
Make sure that the SQL Server instance has been running long enough to ensure that the complete, typical workload will
be represented in the reported statistics. Again, don't forget about periodic, reporting workloads that might not show up in
the day-to-day workload. Even though the indexes that facilitate such workloads will be infrequently used, their presence
will be critical.
7/20
11/7/2014
have its metadata in the cache, and the cumulative counts may reflect activity since the instance of
SQL Server was last started. The metadata for a less active heap or index will move in and out of the
cache as it is used. As a result, it may or may not have values available
Since the "grain" of the function is the partition level, a table that is partitioned into five parts, will have five rows in this
DMF, whereas sys.dm_db_index_usage_stats will see the object as only a single row. Use usage stats if you want
counts of each usage, as each usage in counted once. The operational stats object may have multiple values set for
each type of activity recorded. Finally, note that we cannot use APPLY operators with this DMF.
Whereas the usage stats give a feel for how an index is used by the optimizer to satisfy the needs of certain queries, the
operational stats offer more detailed information about how the index is used at a physical level, via columns such as
leaf_insert_count, leaf_update_count and leaf_delete_count (the cumulative number of leaf-level inserts,
updates and deletes), as well as the nonleaf_* equivalents, for modifications above the leaf level.
For diagnosis of resource contention on the object, the following columns are particularly useful:
row_lock_count number of row locks that have been requested against this index
row_lock_wait_count number of times a session has waited on a row lock against this index
row_lock_wait_in_ms amount of time a session had to wait on a row lock against this index
page_lock_count, page_lock_wait_count, page_lock_wait_in_ms same as row_lock values at the
page grain
index_lock_promotion_attempt_count, index_lock_promotion_count number of times the lock grain
for an operation using this index was attempted or granted to be escalated (like from row to page)
page_latch_wait_count, page_latch_wait_in_ms number of waits and time waited on the physical page
of the object to have the latch removed
page_io_latch_wait_count, page_io_latch_wait_in_ms number of waits and time while SQL loads
pages from disk into memory for an index operation.
This DMF offers many more columns, for example to investigate use of row overflow data, LOB data, and so on. For a full
listing, see Books Online. Let's see this DMF in action.
Detailed activity information for indexes not used for user reads
The script in Listing 6 isolates just those indexes that are not being used for user reads, courtesy of
sys.dm_db_index_usage_stats, and then provides detailed information on the type of writes still being incurred,
using the leaf_*_count and nonleaf_*_count columns of sys.dm_db_index_operational_stats. In this way,
you gain a deep feel for how indexes are being used, and just exactly how much the index is costing you.
SELECT
FROM
https://www.simple-talk.com/sql/performance/tune-your-indexing-strategy-with-sql-server-dmvs/
8/20
11/7/2014
Upon review of the output, it's quite clear that some of these indexes are still being hammered by inserts even though the
users are not benefiting from their existence in regards to reads. If I encountered metadata like this in the real world
(wink, wink) you could be sure that I would do something about it.
FROM
https://www.simple-talk.com/sql/performance/tune-your-indexing-strategy-with-sql-server-dmvs/
9/20
11/7/2014
WHERE
ddios.row_lock_wait_count > 0
AND OBJECTPROPERTY(ddios.[object_id], 'IsUserTable') = 1
AND i.[index_id] > 0
ORDER BY ddios.[row_lock_wait_count] DESC ,
su.[name] ,
o.[name] ,
i.[name ]
Notice that in the calculations of both the [%_times_blocked] and avg_row_lock_wait_in_ms columns, we've had to
use a decimal multiplication factor:
CAST (100.0 * ddios.[row_lock_wait_count] / (ddios.[row_lock_count]) AS decimal(5,2))
CAST (1.0 * ddios.[row_lock_wait_in_ms] / ddios.[row_lock_wait_count] AS decimal(15,2)).
This is due to an unfortunate glitch in the data type conversion process within T-SQL that you are never aware of until it
sneaks up on you, and you spend hours trying to figure out why your results don't follow basic mathematical rules.
Unless a mathematical formula includes a decimal, float, or other non-integer numeric data type, the results will only
produce an integer result, even when the math warrants a non-integer result. You can try this for yourself. What do you
get when you execute the following code in a query window?
SELECT 3/2
I bet you the answer is not 1.5. The way to fix this is to force a conversion to decimal form by including a constant that
best fits your formula, in the form of a decimal, as demonstrated in the previous calculations.
FROM
https://www.simple-talk.com/sql/performance/tune-your-indexing-strategy-with-sql-server-dmvs/
10/20
11/7/2014
WHERE
ddios.page_io_latch_wait_count > 0
AND OBJECTPROPERTY(i.OBJECT_ID, 'IsUserTable') = 1
ORDER BY ddios.page_io_latch_wait_count DESC ,
avg_page_io_latch_wait_in_ms DESC
Latching occurs when the engine reads a physical page. Upon doing so, it issues a latch, scans the page, reads the row,
and then releases the latch when, and this is important, the page is needed for another process. This process is called
lazy latching. Though latching is quite a benign process, it is of interest to have handy such information as this query
provides. It allows us to identify which of our indexes are encountering significant waits when trying to issue a latch,
because another latch has already been issued. I/O latching occurs on disk-to-memory transfers, and high I/O latch
counts could be a reflection of a disk subsystem issue, particularly when you see average latch wait times of over 15
milliseconds.
FROM
WHERE
11/20
11/7/2014
The sys.dm_os_wait_stats DMV is a great "first hit" resource for drilling into issues that may instigate those "Hey, the
database is slow" phone calls that we all know and love at 3 a.m. If the outcome of your queries into
sys.dm_os_wait_stats points to locking problems, the query in Listing 10 makes a good next step in the investigation.
This original idea comes from the Microsoft "SQL Server Premier Field Engineer" blog, at
http://blogs.msdn.com/b/sql_pfe_blog/ with a few enhancements to identify the indexes by name in the results.
SELECT
Notice the very useful outer join to sys.dm_db_missing_index_details to identify if there was a potential suggestion
for a missing index that may resolve the locking. Of course, before implementing any new index, you should first test it
thoroughly in your test environment, which we discuss in depth as we move on to look at the missing index DMOs.
12/20
11/7/2014
The first thing to note is that there is no index_id in any of the missing index DMOs. This is because the returned results
are recommendations for indexes which have yet to be created, and are therefore non-materialized. The unique identifier
for the records in these DMVs is the index_handle column, which is unique across the entire SQL Server instance.
The data stored by each of these DMOs is reset on a server restart. This is why it is so important to preserve this
cumulative data and keep your instances in a constantly running state; you need to make sure, when you use this data,
that the stored statistics are fully representative of your normal query workload. One service restart, and your accrued
history (and the ability to generate meaningful results for this and other DMV-based queries) is, pardon the pun, history.
Furthermore, the data stored in these DMOs is also volatile and based on active queries. By implementing a single new
index on a given table or view, the results of the DMO query for that object may no longer be valid.
MSDN covers well the columns returned by each one (http://msdn.microsoft.com/en-us/library/ms187974.aspx) so here
we'll only review the most significant columns, for each DMO.
13/20
11/7/2014
object_id.
is used
to
relate
the
row
to
the
index_handle the handle of the index, used to relate the row to sys.dm_db_missing_index_details and
sys.dm_db_missing_index_columns.
Currently, in there is only one index to a group but, for future compatibility, you should consider the key of this object to
be comprised of both columns.
14/20
11/7/2014
Listing 11 provides a quick and useful query, based on this formula, that DBAs can run to identify potentially useful
indexes. The results of this query are instance-wide, so be sure to limit your results to just the database in question, in the
WHERE clause, as demonstrated here. This query provides the DBA with information directly from the query optimizer
history, accrued since the last restart of the SQL Server service. It provides information on columns the optimizer would
have preferred to have indexed, based upon the original parse of the query upon execution. Equality columns, inequality
columns, and included columns are each identified. Also presented are the accrued counts of compiles and seeks, as
well as calculated figures that denote the amount of improvement to be gained if the indexes were created.
SELECT
FROM
https://www.simple-talk.com/sql/performance/tune-your-indexing-strategy-with-sql-server-dmvs/
15/20