SQL Tuning Guidelines
SQL Tuning Guidelines
TABLE OF CONTENTS
The document details the most cost effective way of writing SQL statements. Poorly
written SQL statements account for 80% of performance problems. Using the guidelines
detailed in this document the cost of SQL statements can be reduced drastically.
Document Usage
Details of Document
SQL Tuning Overview
• The DBA & Developers should work together when tuning statements. During design
and development, the application Developers can determine which combination of
system resources and Oracle features best meet the performance goals and
requirements as established by the business rules. The best possible execution time
occurs with the usage of least amount of resources, which include I/O (logical and
physical) to and from the database, CPU usage.
• There are many tools that can assist in the tuning endeavor such as TKPROF, the SQL
trace facility, SQL Analyze, Oracle Trace, and the Enterprise Manager Tuning Pack.
But tuning SQL statements remains a trial and error process.
• SQL tuning should be done prior to looking at database tuning effort. Until it is certain
that all the applicable SQL statements are tuned, the system parameters should not be
modified to adjust for poorly written SQL.
• SQL tuning involves the least expense compared to hardware changes to increase the
memory capacity of the systems.
• For SQL tuning to be accomplished, it is necessary to know what the current execution
time is and what resources are needed to successfully execute the statement there by
improving its performance.
• It is very essential to write optimal queries so that the CPU resources can be equally
shared among the various users connected to the systems in a fair manner.
• Oracle execution path may not be the source of the performance problem; it is very
likely the wrong approach to the problem has been followed. This may not even be the
best path to follow. It is helpful if alternative ways of accomplishing the same tasks are
looked into. High performance improvements may be achieved by making a simple
statement work from a different angle.
• The initial focus should be on the most offending SQL, as they will yield the best
immediate returns.
Views
• Optimizing the performance of views involves optimizing the SQL statement upon
which the view is based and also the SQL that is likely to result when selection criteria
are pushed up into the view.
• Creating views containing hints can be a useful technique for optimizing queries and
SQL generated by query tools.
• Partition views can be created which reduces the overhead of scanning a substantive
range of large tables.
• Snapshots can be used to store the results of complex queries and allow the rapid
retrieval of results, which may somewhat be out of range.
• Snapshots can reduce the overhead of SQL dramatically for static tables.
• Oracle sequences are an efficient mechanism of generating primary key values and
should be used in preference to sequence tables or other mechanisms.
• The DECODE operator can be used to perform complex aggregations, which might
otherwise need to be performed via multiple queries.
Indexes
• Indexes on small tables (less than 100 rows) are totally useless, unless of course they
are here to implement unique constraints (primary or unique keys).
• Indexes for which there are less than 8 different key values should come under very
close scrutiny (this of course refers to 'traditional' tree-type indexes, not bitmap ones).
Here, the distribution of values is of course extremely important: -
If the values are roughly equally distributed, the index can be scrapped.
If, however, searches are for keys, which belong to a small minority, then the index can
be kept. It can make sense to have an index on a personnel table on the 'SEX' column
if you are frequently searching for females in a predominantly male environment or
vice versa.
• Concatenated indexes should be concatenated from the most selective to the least
selective column.
• Indexes on single columns, which also appear as the first column in a concatenated
index, are redundant. Database engines can use the concatenated index only when
the first columns in the index appear in the search condition.
4. SQL Processing
Here's an overview about the SQL processing which will help appreciate the need for the
better-written SQL statements.
1. Open
2. Parse
3. Bind
4. Execute
5. Fetch (Select statements)
6. Close
SQL Standards
It is essential to standardize the way the SQL statements are written. This not only
ensures that identical statements performing the same tasks are shared but also provides
the ease of reading the statements.
If the list of employees whose salary is greater than the average salary of their respective
department is to be found: -
SELECT ename,job,sal,comm,hiredate,deptno
FROM emp E1
WHERE sal >
(SELECT AVG(sal)
FROM emp E2
WHERE E2.deptno=E1.deptno);
There is also a technical reason to use a set style throughout the organization. If
the statement it is trying to parse, does not match a statement in the shared pool
precisely, character by character, and case for case, Oracle will not re-use the parsed
statement and instead will opt to create a new cursor. This means your statement will
have to undergo the Oracle parse, bind, execute, and fetch phases again. Also the extra
resources in the library cache will be added to maintain this copy of a statement. You may
actually age out a statement frequently used if the shared pool is too small.
In the following section we would be looking into various situations. Under each
scenario the appropriate way to write the query and an inappropriate way to write the
query is discussed. It should be noted that both the compared queries return the same
results, the only difference being the efficiency. The whole section deals with two tables,
namely EMP and DEPT. Their structures are given below: -
EMP table
DEPT Table
DEPTNO from the EMP table references the DEPT table. EMPNO column is the primary
key in the EMP table and DEPTNO is the primary key in the DEPT table. EMP table has
9.5 lac rows. DEPT table has 4 rows. It is assumed that none of the columns have
indexes unless specified. The cost depicted in the results may vary according to the size
of the table and the number of distinct values in the columns. Also note that the tables
used here are created in a test environment and more real time examples will be added
later as and when they come up considering the fact that this is the first iteration of the
document. It is also worth mentioning that Indexes should be made use of on column,
only after carefully considering the frequency and type of activities on the column in
question.
The COST, OPERATIONS, OBJECT_NAME, OPTIONS columns
are selected from the table, PLAN_TABLE, which is created by running the script
utlxplan.sql (contact your DBA for creating this table).
GUIDELINE 1:
1. MAKE THE MOST OFTEN USED COLUMN IN A WHERE CLAUSE AS THE
LEADING COLUMN IN A CONCATENATED INDEX.
A composite index exists on the empno and deptno columns of the emp table(ie
EMPNO_DEPTNO_INDX in which empno is the leading column). All employees
belonging to department 10 have to be selected.
Query 1
Query 2
Inference
There is a huge cost difference between the above two queries, almost a 62% reduction
in the cost when the second approach is used.
Thus, the leading column should be the most selective column, and it should also
be the column most often used by limiting conditions in queries. So it is advisable to
include a dummy WHERE clause for such queries as shown in the second query.
GUIDELINE 2:
2. AVOID USING UPPER OR LOWER FUNCTIONS ON COLUMNS WHICH
ARE INDEXED.
A non unique index exists on the ename column of the emp table (ie ENAME_INDX). All
employees having the name 'KING' have to be selected.
Query 1
or
Query 2
Inference
The cost is reduced by an incredible 99% when the use of functions in the WHERE
clause is eliminated.
Thus, even though the leading column of the index forms a part of the WHERE clause,
the available index will not be used. This is because, an SQL function is applied to that
column. In such cases, the Oracle Optimizer will not make use of the available index and
will opt for a FULL scan on the table. The optimizer has to apply the function to the value
in each row which satisfy the condition specified which increases the cost incurred.
GUIDELINE 3:
Query 1
Query 2
Inference
Here again, the cost is reduced by more than 82% by avoiding the use of SUBSTR
function in the WHERE clause. As explained above, the Optimizer will not make use of
the index even though it is available, since there is a function on the indexed column in
the WHERE condition.
GUIDELINE 4:
Query 1
Query 2
SELECT D.deptno,D.dname
FROM dept D
WHERE EXISTS ( SELECT 'X'
FROM emp E
WHERE E.deptno = D.deptno );
Inference
There is a reduction of more than 97% in estimated cost.In the query using the DISTINCT
clause, the explain plan shows that there is a HASH join between the two tables. In this,
the smaller of the two tables (in this case the DEPT table) is converted into a hash table
and stored in memory. For every row of the larger table (EMP table) retrieved, the hash
table is scanned to check for the join condition. Click here for a brief explanation on hash
joins.
GUIDELINE 5:
Two tables EMP1 and DEPT having different data have to be displayed in a combined
fashion.
Query 1
SELECT empno FROM emp1
UNION
SELECT depno FROM dept;
Query 2
Inference
The estimated cost of processing the query is reduced by 96% when UNION ALL is used
instead of UNION. UNION ALL simply returns all rows including duplicates and does not
have to perform any sort, merge or filter Operations (costly operations performed by
UNION, which are unnecessary in the above case especially since the unique columns
are being selected).
GUIDELINE 6:
The maximum SALARY in each JOB type has to be displayed. JOB types of
'CLERK' and 'SALESMAN' are NOT to be considered.
Query 1
SELECT JOB,MAX(sal)
FROM emp
GROUP BY job
HAVING job NOT IN ('CLERK','SALESMAN');
Query 2
SELECT JOB,MAX(sal)
FROM emp
WHERE job NOT IN ('CLERK','SALESMAN')
GROUP BY job;
Inference
The estimated cost of processing the second query is less than that of the first one, by
33%.
In the first case, the GROUP operation is performed on the specified columns and the
required rows are FILTERED out based on the condition specified (eliminating those rows
where JOB is CLERK or SALESMAN). Since the filtering is performed after the group
operation, the cost increases.
In the second case, the required rows are first selected by a FULL table scan on the table
EMP1. These rows are then grouped according to the specification.
Since, the HAVING clause filters selected records only after all rows have been
fetched. Thus, it is advisable to use the WHERE clause in combination with a
GROUP BY clause to reduce these overheads.
GUIDELINE 7:
Select the EMPNO, ENAME, DNAME from the EMP and DEPT tables. This
example demonstrates the efficiency of a CLUSTER in case of joined table.
SELECT empno,ename,dname
FROM emp,dept
WHERE emp.deptno=dept.deptno;
Inference
The above execution plans for the join query shows a reduction of cost by 33% in favour
of clustered tables.
Hence it is advisable to use clusters for tables, which are predominantly used together
using a join condition. Click here for limitations and capabilities of clusters.
GUIDELINE 8:
Generating an ordered output of the ENAME column (a varchar column). The ENAME
column has a non-unique index (ENAME_INDX).
Query 1
SELECT ename
FROM emp
ORDER BY ename;
Query 2
SELECT ename
FROM emp
WHERE ename > TO_CHAR(0);
Inference
The cost is reduced by an incredible 99% when the second query is used.It is clear from
the above example that the Oracle Optimizer ignores indexes created on character
columns. Hence it is necessary to explicitly specify the usage of indexes to reduce the I/O
costs. Use of dummy WHERE clause as shown in the second query is effective in such
cases.
NOTE: - This property is true for null columns (even null number columns). However, this
is not true for non-null number columns. If an index is available in a number column
oracle uses it if necessary.
GUIDELINE 9:
Inference
With a FBI, the processing cost is reduced by 98%.Avoid doing calculations on indexed
columns. When the optimizer encounters a calculation on an indexed column, it will not
use the index and will perform a full table scan. If a need arises to perform such
calculations and specify them in the WHERE clause, create a FBI. Click here for
capabilities of FBI’s.
GUIDELINE 10:
The detailed list of all those employees who have been hired today has to be displayed.
An index exists on the HIREDATE column of the emp table.
Query 1
Query 2
The cost in the second case is reduced by 99% than in the first case.As said before using
of functions disables the use of indexes. Hence the above query shows a way to over
come such an problem.
GUIDELINE 11:
It is always a good practice to use bind variables. It is also advisable to have consistency
in the manner (i.e. consistent use of upper and lower case of alphabets in the query,
uniform use of spaces etc.) the statements are typed. These practices facilitate oracle in
saving up time and resources spent on processing the same statements.
Example1:
For the above queries, which perform the same task, parsing will be done three times
because Oracle considers them as three separate queries. Therefore the given queries
should be written in a consistent manner as shown below so that next time an identical
query is issued Oracle uses the already parsed statement.
Example2:
For the above queries bind variables should be used so as to make Oracle reuse the
same statement.
GUIDELINE 12:
Inference
In the above example, Oracle does a full table scan for the table, as it does not find any
index for the EMPNO column. The cost incurred here is whopping. After the index is
created, Oracle does a full scan of the index and the cost incurred is reduced by more
than 99%.
GUIDELINE 13:
All employees who have a valid department ID are to be displayed (i.e. their
corresponding DEPTNO must exist in the dept table). Here a primary key index is present
in the DEPTNO column of the DEPT table.
NOTE: - The situation to run this query may occur if the relational constraints are not in
place.
Query 1
SELECT E.empno,E.ename,D.deptno
FROM emp E,dept D
WHERE E.deptno = D.deptno;
Query 2
SELECT E.empno,E.ename,E.deptno
FROM emp E
WHERE EXISTS (SELECT deptno
FROM dept D
WHERE D.deptno = E.deptno);
Inference
The cost of processing the query is reduced by almost 33% when EXISTS is used in the
query instead of a join condition. Consider using EXISTS instead of joining the tables if
the percentage of successful rows returned from the driving table (i.e. the number of rows
that need to be validated against the subquery) is small. In such a case, a table join will
be inefficient.
GUIDELINE 14:
Query 1
SELECT COUNT(job)
FROM emp;
Query 2
or
Inference
The estimated processing costs are reduced by more than 99% in the second case.While
using the COUNT function in the select statement make sure that you use either ROWID
or the indexed column, using these the performance is increased and the cost incurred is
low. This is because if COUNT is made on a non-indexed column, the Oracle optimizer
will opt for a FULL table scan rather than use an index existing on some other
column.Also if COUNT(*) is used Oracle will first resolve the columns of the table and
then decide on the appropriate path for the query.
GUIDELINE 15:
The employee details along with their respective department names are to be
displayed. This theory may only be feasible if the second table has a very small number
of rows.
Query 1
Inference
The reduction in the estimated cost is almost 50% in the second case.Use DECODE
statement to reduce processing and to avoid having to scan the same rows repetitively.
This increases the performance and reduces the cost incurred. This method however is
feasible only if the second table is small.
GUIDELINE 16:
Select the EMPNO, ENAME, DNAME from the EMP and DEPT tables where the deptno
is 10 or 30. This example demonstrates the efficiency of paritioning large tables.
Query 1
SELECT E.empno,E.ename,D.deptno,D.dname
FROM emp E,dept D
WHERE E.deptno=D.deptno AND E.deptno IN (10,30);
Query 2
SELECT E.empno,E.ename,D.deptno,D.dname
FROM emp_part E,dept D
WHERE E.deptno=D.deptno AND E.deptno IN (10,30);
OPERATION OPTIONS OBJECT_NAME COST
------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 851
HASH JOIN 851
TABLE ACCESS FULL DEPT 1
PARTITION RANGE INLIST
TABLE ACCESS FULL EMP_PART 612
Inference
The cost comes down by 20% when a table is partitioned.Large tables can be partitioned
based either on RANGE or HASH. Range partitioning is useful when the table data can
be distributed among many logical ranges based on ranges of a particular column.
GUIDELINE 17:
Select the employee details of all employees having EMPNO's less than
1000.
Query 1
SELECT empno,ename
FROM emp
WHERE empno < 1000;
Query 2
SELECT empno,ename
FROM empi
WHERE empno < 1000;
OPERATION OPTIONS OBJECT_NAME COST
------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 10
INDEX RANGE SCAN EMPI_EMPNO_PK 2
Inference
The processing costs are reduced by almost 99% in case of the second query. In the
above example, both EMP and EMPI have 9.5 lac rows with the same data and have
EMPNO as the primary key. The only difference being that the EMPI table is an INDEX
ORGANIZED TABLE.
Click here for syntax and information on Index Organized tables.
Addendum
Note1
1. Hash Joins:
Hash joins are used for joining large data sets. The optimizer uses the smaller of the two
tables/data sources to build a hash table on the join key in memory.It then scans the
larger table, probing the hash table to find the joined rows.This operation involves joining
two sets of rows and returning the result. Apart from this the result needs to be SORTED
for unique values i.e. an operation involving sorting a set of rows to eliminate duplicates.If
the table is small enough to fit into the memory,then the cost is limited to a single read
pass over the data for the two tables.But if,the hash table is too big to fit in to the
memory,optimizer chooses partitioning.
[BACK]
Note2
2. Clusters:
In this cluster DEPTNO is the cluster key. Be sure to choose your cluster key carefully
and it should represent the join condition between the cluster tables.
(i) Clusters should be used for tables predominantly used in join queries, they help in
increasing the performance. They should be preferred in Data Warehouse Environments
where the data in the cluster tables is not likely to change often.
(ii) Clusters can degrade the performance of your database when used in OLTP
(Online Transaction Processing) environments where the data in the cluster tables is
likely to be modified often.
You are advised make use clusters only after making a proper evaluation of the
environment your application will be working in.
[BACK]
Note3
(2) They provide an efficient mechanism for evaluating predicates involving functions.
(4) Descending order indexes can be created. They are treated as a special case of
function-based indexes.
[BACK]
Note4
4. Partitioned Tables:
Partitioned tables allow your data to be broken down into smaller, more
manageable pieces called partitions, or even subpartitions. Indexes may be
partitioned in similar fashion. Each partition can be managed individually, and can
operate independently of the other partitions, thus providing a structure that can
be better tuned for availability and performance.
[BACK]
Note5
5. Index-Organized Tables:
Index-organized tables are like regular tables with a primary key index on
one or more of its columns. However, instead of maintaining two separate storage spaces
for the table and B*tree index, an index-organized table only maintains a single B*tree
index containing the primary key of the table and other column values.
Index-organized tables provide fast key-based access to table data for
queries involving exact match and range searches especially on the primary key as seen
above. Changes to the table data (such as adding new rows, updating rows, or deleting
rows) result only in updating the index structure (because there is no separate table
storage area). Index-organized tables are suitable for accessing data by way of primary
key or any key that is a valid prefix of the primary key. There is no duplication of key
values and storage requirements are reduced because a separate index structure
containing the key values and ROWID is not created.
[BACK]