Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
796 views

SQL Tuning Guidelines

This document provides guidelines for writing SQL statements to improve performance. It discusses why SQL tuning is important as it is one of the least expensive performance optimization steps. It then covers general SQL tuning techniques like using indexes, views, and sequences appropriately. The document also explains SQL processing to understand how the database executes statements. It provides 17 specific standards for writing efficient SQL with examples like using indexes effectively and pushing predicates to reduce I/O. An addendum further explains concepts like hash joins, clusters, and partitioned tables. The overall goal is to help SQL developers and DBAs optimize statements by following these best practices.

Uploaded by

api-3706726
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
796 views

SQL Tuning Guidelines

This document provides guidelines for writing SQL statements to improve performance. It discusses why SQL tuning is important as it is one of the least expensive performance optimization steps. It then covers general SQL tuning techniques like using indexes, views, and sequences appropriately. The document also explains SQL processing to understand how the database executes statements. It provides 17 specific standards for writing efficient SQL with examples like using indexes effectively and pushing predicates to reduce I/O. An addendum further explains concepts like hash joins, clusters, and partitioned tables. The overall goal is to help SQL developers and DBAs optimize statements by following these best practices.

Uploaded by

api-3706726
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 26

SQL TUNING

TABLE OF CONTENTS

PURPOSE OF THE DOCUMENT...................................................................... .........................3

DOCUMENT USAGE............................................................................................................ ........3

DETAILS OF DOCUMENT............................................................................................ ..............3

SQL TUNING OVERVIEW....................................................................................... ...................3


1. WHY IS SQL TUNING IMPORTANT?....................................................................................................3
2. SOME THINGS TO KNOW ABOUT SQL TUNING...................................................................................... 3
3. GENERAL TECHNIQUES FOR IMPROVING SQL PERFORMANCE................................................................4
4. SQL PROCESSING...........................................................................................................................5
SQL STANDARDS..................................................................................................... ....................6
THE INAPPROPRIATE WAY OF WRITING THE QUERY.....................................................................................6
THE APPROPRIATE WAY OF WRITING THE QUERY ...................................................................................... 6
1. MAKE THE MOST OFTEN USED COLUMN IN A WHERE CLAUSE AS THE LEADING
COLUMN IN A CONCATENATED INDEX................................................................................ 8
2. AVOID USING UPPER OR LOWER FUNCTIONS ON COLUMNS WHICH ARE
INDEXED.......................................................................................................................................9
3. AVOID USING SUBSTR FUNCTION ON INDEXED COLUMNS........................................9
4. USE EXISTS IN PLACE OF DISTINCT................................................................................ 10
5. WHICH IS BETTER - UNION ALL OR UNION?..................................................................11
6. IS HAVING CLAUSE COSTLY?............................................................................................12
7. USE CLUSTERS FOR TABLES WHICH ARE OFTEN USED TOGETHER IN A JOIN
CONDITION................................................................................................................................13
8. USE DUMMY WHERE CLAUSES TO FORCE USE OF INDEXES ON CHARACTER
COLUMNS...................................................................................................................................14
9. USE FUNCTION BASED INDEXES ON COLUMNS WHICH ARE OFTEN USED WITH
A FUNCTION. ............................................................................................................................15
10. AVOID USING FUNCTIONS LIKE TRUNC ON INDEXED COLUMNS......................... 16
11. CONSIDER THE USE OF BIND VARIABLES ...................................................................17
12. CREATE INDEXES ON COLUMNS, WHICH ARE OFTEN USED FOR SORTING
PURPOSES...................................................................................................................................17
13. USE EXISTS IN PLACE OF JOINS WHERE THE NUMBER OF SUCCESSFUL ROWS
RETURNED FROM THE DRIVING TABLE IS SMALL..........................................................18
14. USE EITHER ROWID OR AN INDEXED COLUMN IN COUNT FUNCTION................ 19
15. USE DECODE WHEREVER POSSIBLE.............................................................................20
16. USE PARTITIONS FOR LARGE TABLES FOR FASTER RETRIEVAL OF DATA..........21
17. USE INDEX-ORGANIZED TABLES FOR QUERIES INVOLVING EXACT MATCH
AND RANGE SEARCHES ESPECIALLY ON THE PRIMARY KEY. .................................... 22
ADDENDUM........................................................................................................... .....................23
1. HASH JOINS:................................................................................................................................23
2. CLUSTERS:...................................................................................................................................23
3. FUNCTION BASED INDEXES:............................................................................................................24
4. PARTITIONED TABLES:....................................................................................................................25
5. INDEX-ORGANIZED TABLES:...........................................................................................................25
Purpose of the document

The document details the most cost effective way of writing SQL statements. Poorly
written SQL statements account for 80% of performance problems. Using the guidelines
detailed in this document the cost of SQL statements can be reduced drastically.

Document Usage

The target audience of this document are :


• SQL Programmers/Developers
• DBA’s
• Technical Leads
• Testing/Quality Assurance

Details of Document
SQL Tuning Overview

1. Why is SQL tuning important?

• First in the series of steps undertaken for tuning the database.


• One of the least expensive steps.
• Generally highest possible performance gain

2. Some things to know about SQL tuning

• The DBA & Developers should work together when tuning statements. During design
and development, the application Developers can determine which combination of
system resources and Oracle features best meet the performance goals and
requirements as established by the business rules. The best possible execution time
occurs with the usage of least amount of resources, which include I/O (logical and
physical) to and from the database, CPU usage.

• There are many tools that can assist in the tuning endeavor such as TKPROF, the SQL
trace facility, SQL Analyze, Oracle Trace, and the Enterprise Manager Tuning Pack.
But tuning SQL statements remains a trial and error process.

• SQL tuning should be done prior to looking at database tuning effort. Until it is certain
that all the applicable SQL statements are tuned, the system parameters should not be
modified to adjust for poorly written SQL.

• SQL tuning involves the least expense compared to hardware changes to increase the
memory capacity of the systems.
• For SQL tuning to be accomplished, it is necessary to know what the current execution
time is and what resources are needed to successfully execute the statement there by
improving its performance.

• It is very essential to write optimal queries so that the CPU resources can be equally
shared among the various users connected to the systems in a fair manner.

• Oracle execution path may not be the source of the performance problem; it is very
likely the wrong approach to the problem has been followed. This may not even be the
best path to follow. It is helpful if alternative ways of accomplishing the same tasks are
looked into. High performance improvements may be achieved by making a simple
statement work from a different angle.

• The initial focus should be on the most offending SQL, as they will yield the best
immediate returns.

3. General Techniques For Improving SQL Performance

Views

• Optimizing the performance of views involves optimizing the SQL statement upon
which the view is based and also the SQL that is likely to result when selection criteria
are pushed up into the view.

• Creating views containing hints can be a useful technique for optimizing queries and
SQL generated by query tools.

• Partition views can be created which reduces the overhead of scanning a substantive
range of large tables.

• Snapshots can be used to store the results of complex queries and allow the rapid
retrieval of results, which may somewhat be out of range.

• Performance of a simple snapshot refresh is improved by the creation of a snapshot


log, providing that only a small proportion of the source table has been changed.

• Snapshots can reduce the overhead of SQL dramatically for static tables.

• Oracle sequences are an efficient mechanism of generating primary key values and
should be used in preference to sequence tables or other mechanisms.

• The DECODE operator can be used to perform complex aggregations, which might
otherwise need to be performed via multiple queries.
Indexes

• Indexes on small tables (less than 100 rows) are totally useless, unless of course they
are here to implement unique constraints (primary or unique keys).

• Indexes for which there are less than 8 different key values should come under very
close scrutiny (this of course refers to 'traditional' tree-type indexes, not bitmap ones).
Here, the distribution of values is of course extremely important: -
If the values are roughly equally distributed, the index can be scrapped.
If, however, searches are for keys, which belong to a small minority, then the index can
be kept. It can make sense to have an index on a personnel table on the 'SEX' column
if you are frequently searching for females in a predominantly male environment or
vice versa.

• Concatenated indexes should be concatenated from the most selective to the least
selective column.

• Indexes on single columns, which also appear as the first column in a concatenated
index, are redundant. Database engines can use the concatenated index only when
the first columns in the index appear in the search condition.

4. SQL Processing

Here's an overview about the SQL processing which will help appreciate the need for the
better-written SQL statements.

SQL Processing Phases

1. Open
2. Parse
3. Bind
4. Execute
5. Fetch (Select statements)
6. Close

The PARSE phase

• Checks to see if the statement already exists in the shared pool


• Checks to see if the statement is valid SQL language
• Oracle will then look to see if the referenced objects exist.
• Oracle will see if the parsing user has permission on all underlying objects.
• Explodes any queries on views and attempts to simplify the statement by using a
join.
• Picks an execution plan and stores it with the cursor
The BIND phase

• Searches statement for bind variables


• Assigns a value to the parsed statement? The execute phase
• Oracle will then run the execution plan against the block buffers.
• Oracle performs all the required input and output calls to gather the data for the
DMLstatement. This phase may require Oracle to perform sorting operations.

The FETCH phase

• Retrieve all the rows for the select statement


• Possibly sort the retrieved records
• All returned rows are retrieved using an array fetch

SQL Standards
It is essential to standardize the way the SQL statements are written. This not only
ensures that identical statements performing the same tasks are shared but also provides
the ease of reading the statements.

If the list of employees whose salary is greater than the average salary of their respective
department is to be found: -

The inappropriate way of writing the query

select ename,job,sal,comm,hiredate,deptno from emp e1 where sal > (select


avg(sal) from emp e2 where e2.deptno=e1.deptno);

The appropriate way of writing the query

SELECT ename,job,sal,comm,hiredate,deptno
FROM emp E1
WHERE sal >
(SELECT AVG(sal)
FROM emp E2
WHERE E2.deptno=E1.deptno);

There is also a technical reason to use a set style throughout the organization. If
the statement it is trying to parse, does not match a statement in the shared pool
precisely, character by character, and case for case, Oracle will not re-use the parsed
statement and instead will opt to create a new cursor. This means your statement will
have to undergo the Oracle parse, bind, execute, and fetch phases again. Also the extra
resources in the library cache will be added to maintain this copy of a statement. You may
actually age out a statement frequently used if the shared pool is too small.
In the following section we would be looking into various situations. Under each
scenario the appropriate way to write the query and an inappropriate way to write the
query is discussed. It should be noted that both the compared queries return the same
results, the only difference being the efficiency. The whole section deals with two tables,
namely EMP and DEPT. Their structures are given below: -

EMP table

EMPNO NOT NULL NUMBER(4)


ENAME VARCHAR2(10)
JOB VARCHAR2(9)
MGR NUMBER(4)
HIREDATE DATE
SAL NUMBER(7,2)
COMM NUMBER(7,2)
DEPTNO NOT NULL NUMBER(2)

DEPT Table

DEPTNO NOT NULL NUMBER(2)


DNAME VARCHAR2(14)
LOC VARCHAR2(13)

DEPTNO from the EMP table references the DEPT table. EMPNO column is the primary
key in the EMP table and DEPTNO is the primary key in the DEPT table. EMP table has
9.5 lac rows. DEPT table has 4 rows. It is assumed that none of the columns have
indexes unless specified. The cost depicted in the results may vary according to the size
of the table and the number of distinct values in the columns. Also note that the tables
used here are created in a test environment and more real time examples will be added
later as and when they come up considering the fact that this is the first iteration of the
document. It is also worth mentioning that Indexes should be made use of on column,
only after carefully considering the frequency and type of activities on the column in
question.
The COST, OPERATIONS, OBJECT_NAME, OPTIONS columns
are selected from the table, PLAN_TABLE, which is created by running the script
utlxplan.sql (contact your DBA for creating this table).

GUIDELINE 1:
1. MAKE THE MOST OFTEN USED COLUMN IN A WHERE CLAUSE AS THE
LEADING COLUMN IN A CONCATENATED INDEX.

A composite index exists on the empno and deptno columns of the emp table(ie
EMPNO_DEPTNO_INDX in which empno is the leading column). All employees
belonging to department 10 have to be selected.

Query 1

SELECT * FROM emp


WHERE deptno = 10;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 822
TABLE ACCESS FULL EMP 822

Query 2

SELECT * FROM emp


WHERE empno > 0 AND deptno = 10;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ -----------------------------------
---------
SELECT STATEMENT 293
TABLE ACCESS BY INDEX ROWID EMP 293
INDEX RANGE SCAN EMPNO_DEPTNO_INDX 26

Inference

There is a huge cost difference between the above two queries, almost a 62% reduction
in the cost when the second approach is used.
Thus, the leading column should be the most selective column, and it should also
be the column most often used by limiting conditions in queries. So it is advisable to
include a dummy WHERE clause for such queries as shown in the second query.

GUIDELINE 2:
2. AVOID USING UPPER OR LOWER FUNCTIONS ON COLUMNS WHICH
ARE INDEXED.

A non unique index exists on the ename column of the emp table (ie ENAME_INDX). All
employees having the name 'KING' have to be selected.

Query 1

SELECT ename FROM emp


WHERE UPPER (ename) = 'KING';

or

SELECT ename FROM emp


WHERE LOWER(ename) = 'king';

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 822
TABLE ACCESS FULL EMP 822

Query 2

SELECT ename FROM emp


WHERE ename='KING';

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 2
INDEX RANGE SCAN ENAME_INDX 2

Inference

The cost is reduced by an incredible 99% when the use of functions in the WHERE
clause is eliminated.
Thus, even though the leading column of the index forms a part of the WHERE clause,
the available index will not be used. This is because, an SQL function is applied to that
column. In such cases, the Oracle Optimizer will not make use of the available index and
will opt for a FULL scan on the table. The optimizer has to apply the function to the value
in each row which satisfy the condition specified which increases the cost incurred.

GUIDELINE 3:

3. AVOID USING SUBSTR FUNCTION ON INDEXED COLUMNS.


A non unique index exists on the ename column of the emp table(ie
ENAME_INDX). All employees with their names having the first two letters as 'KI' have to
be selected.

Query 1

SELECT ename FROM emp


WHERE SUBSTR (ename,1,2) = 'KI';

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 822
TABLE ACCESS FULL EMP 822

Query 2

SELECT ename FROM emp


WHERE ename LIKE 'KI%';

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ -------------------- ----------------------- ----------
SELECT STATEMENT 148
INDEX RANGE SCAN ENAME_INDX 148

Inference

Here again, the cost is reduced by more than 82% by avoiding the use of SUBSTR
function in the WHERE clause. As explained above, the Optimizer will not make use of
the index even though it is available, since there is a function on the indexed column in
the WHERE condition.

GUIDELINE 4:

4. USE EXISTS IN PLACE OF DISTINCT.

The DISTINCT combination of deptno and dname columns have to be displayed.


Only those rows need be displayed, whose deptno exists in the EMP table.

Query 1

SELECT DISTINCT E.deptno,D.dname


FROM emp E,dept D
WHERE D.deptno = E.deptno;
OPERATION OPTIONS OBJECT_NAME COST
------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 14659
SORT UNIQUE 14659
HASH JOIN 1000
TABLE ACCESS FULL DEPT 1
TABLE ACCESS FULL EMP 822

Query 2

SELECT D.deptno,D.dname
FROM dept D
WHERE EXISTS ( SELECT 'X'
FROM emp E
WHERE E.deptno = D.deptno );

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 1
FILTER
TABLE ACCESS FULL DEPT 1
TABLE ACCESS FULL EMP1 822

Inference

There is a reduction of more than 97% in estimated cost.In the query using the DISTINCT
clause, the explain plan shows that there is a HASH join between the two tables. In this,
the smaller of the two tables (in this case the DEPT table) is converted into a hash table
and stored in memory. For every row of the larger table (EMP table) retrieved, the hash
table is scanned to check for the join condition. Click here for a brief explanation on hash
joins.

The query using EXISTS uses a FILTER operation. The FILTER


operation accepts a set of rows, eliminates some of them, and returns the rest. In
the above case, the rows from the DEPT and EMP table are filtered out based on
the join condition. EXISTS is a faster and a cheaper alternative because the
optimizer realizes that when the sub-query has been satisfied once, there is no
need to proceed any further, and the next driving row can be fetched. In the
example shown only one row needs to be returned from dept even though many
rows in emp might match the subquery.

GUIDELINE 5:

5. WHICH IS BETTER - UNION ALL OR UNION?

Two tables EMP1 and DEPT having different data have to be displayed in a combined
fashion.

Query 1
SELECT empno FROM emp1
UNION
SELECT depno FROM dept;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 7378
SORT UNIQUE 7378
UNION-ALL
INDEX FAST FULL SCAN EMP1_EMPNO_IDX 291
TABLE ACCESS FULL DEPT 1

Query 2

SELECT empno FROM emp1


UNION ALL
SELECT deptno FROM dept;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 292
UNION-ALL
INDEX FAST FULL SCAN EMP1_EMPNO_IDX 291
TABLE ACCESS FULL DEPT 1

Inference

The estimated cost of processing the query is reduced by 96% when UNION ALL is used
instead of UNION. UNION ALL simply returns all rows including duplicates and does not
have to perform any sort, merge or filter Operations (costly operations performed by
UNION, which are unnecessary in the above case especially since the unique columns
are being selected).

GUIDELINE 6:

6. IS HAVING CLAUSE COSTLY?

The maximum SALARY in each JOB type has to be displayed. JOB types of
'CLERK' and 'SALESMAN' are NOT to be considered.

Query 1

SELECT JOB,MAX(sal)
FROM emp
GROUP BY job
HAVING job NOT IN ('CLERK','SALESMAN');

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 11443
FILTER
SORT GROUP BY 11443
TABLE ACCESS FULL EMP 822

Query 2

SELECT JOB,MAX(sal)
FROM emp
WHERE job NOT IN ('CLERK','SALESMAN')
GROUP BY job;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 7623
SORT GROUP BY 7623
TABLE ACCESS FULL EMP 822

Inference

The estimated cost of processing the second query is less than that of the first one, by
33%.
In the first case, the GROUP operation is performed on the specified columns and the
required rows are FILTERED out based on the condition specified (eliminating those rows
where JOB is CLERK or SALESMAN). Since the filtering is performed after the group
operation, the cost increases.
In the second case, the required rows are first selected by a FULL table scan on the table
EMP1. These rows are then grouped according to the specification.
Since, the HAVING clause filters selected records only after all rows have been
fetched. Thus, it is advisable to use the WHERE clause in combination with a
GROUP BY clause to reduce these overheads.

GUIDELINE 7:

7. USE CLUSTERS FOR TABLES WHICH ARE OFTEN USED TOGETHER IN


A JOIN CONDITION.

Select the EMPNO, ENAME, DNAME from the EMP and DEPT tables. This
example demonstrates the efficiency of a CLUSTER in case of joined table.

Query 1 (no cluster exists)


SELECT empno, ename, dname
FROM emp, dept
WHERE emp.deptno=dept.deptno;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 1100
HASH JOIN 1100
TABLE ACCESS FULL DEPT 1
TABLE ACCESS FULL EMP 822

Query 2 (a cluster exists with deptno as the cluster key)

SELECT empno,ename,dname
FROM emp,dept
WHERE emp.deptno=dept.deptno;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 879
NESTED LOOPS 879
TABLE ACCESS FULL DEPT 879
TABLE ACCESS CLUSTER EMP

Inference

The above execution plans for the join query shows a reduction of cost by 33% in favour
of clustered tables.
Hence it is advisable to use clusters for tables, which are predominantly used together
using a join condition. Click here for limitations and capabilities of clusters.
GUIDELINE 8:

8. USE DUMMY WHERE CLAUSES TO FORCE USE OF INDEXES ON


CHARACTER COLUMNS.

Generating an ordered output of the ENAME column (a varchar column). The ENAME
column has a non-unique index (ENAME_INDX).

Query 1

SELECT ename
FROM emp
ORDER BY ename;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 8915
SORT ORDER BY 8915
TABLE ACCESS FULL EMP 822

Query 2

SELECT ename
FROM emp
WHERE ename > TO_CHAR(0);

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 4
INDEX FAST FULL SCAN ENAME_INDX 4

Inference

The cost is reduced by an incredible 99% when the second query is used.It is clear from
the above example that the Oracle Optimizer ignores indexes created on character
columns. Hence it is necessary to explicitly specify the usage of indexes to reduce the I/O
costs. Use of dummy WHERE clause as shown in the second query is effective in such
cases.

NOTE: - This property is true for null columns (even null number columns). However, this
is not true for non-null number columns. If an index is available in a number column
oracle uses it if necessary.

GUIDELINE 9:

9. USE FUNCTION BASED INDEXES ON COLUMNS WHICH ARE OFTEN


USED WITH A FUNCTION.

Details of all those employees whose SALARY*COMMISION is greater than 5000, is to


be displayed.

Query 1 (No Function based index [FBI] created)

SELECT * FROM emp


WHERE (sal * NVL(comm,0)) > 5000;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ---------
SELECT STATEMENT 822
TABLE ACCESS FULL EMP 822

Query 2 (A FBI has been created)


SELECT * FROM emp
WHERE (sal * NVL(comm,0)) > 5000;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 10
TABLE ACCESS BY INDEX ROWID EMP 10
INDEX RANGE SCAN SAL_COMM_FINX 2

Inference

With a FBI, the processing cost is reduced by 98%.Avoid doing calculations on indexed
columns. When the optimizer encounters a calculation on an indexed column, it will not
use the index and will perform a full table scan. If a need arises to perform such
calculations and specify them in the WHERE clause, create a FBI. Click here for
capabilities of FBI’s.

GUIDELINE 10:

10. AVOID USING FUNCTIONS LIKE TRUNC ON INDEXED COLUMNS.

The detailed list of all those employees who have been hired today has to be displayed.
An index exists on the HIREDATE column of the emp table.

Query 1

SELECT * FROM emp


WHERE TRUNC(hiredate) = TRUNC(sysdate);

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 822
TABLE ACCESS FULL EMP 822

Query 2

SELECT * FROM emp


WHERE hiredate BETWEEN TRUNC(sysdate)
AND TRUNC(sysdate)+.99999;

[ie hiredate between 12:00:00 AM to 11:59.59 PM ]

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 6
TABLE ACCESS BY INDEX ROWID EMP 6
INDEX RANGE SCAN EMP_HDT_IDX 2
Inference

The cost in the second case is reduced by 99% than in the first case.As said before using
of functions disables the use of indexes. Hence the above query shows a way to over
come such an problem.

GUIDELINE 11:

11. CONSIDER THE USE OF BIND VARIABLES

It is always a good practice to use bind variables. It is also advisable to have consistency
in the manner (i.e. consistent use of upper and lower case of alphabets in the query,
uniform use of spaces etc.) the statements are typed. These practices facilitate oracle in
saving up time and resources spent on processing the same statements.

Example1:

Select * from emp;


Select * from emp;
SELECT * FROM EMP;

For the above queries, which perform the same task, parsing will be done three times
because Oracle considers them as three separate queries. Therefore the given queries
should be written in a consistent manner as shown below so that next time an identical
query is issued Oracle uses the already parsed statement.

SELECT * FROM emp;

Example2:

SELECT * FROM emp


WHERE empno=7839;
SELECT * FROM emp
WHERE empno=7844;

For the above queries bind variables should be used so as to make Oracle reuse the
same statement.

SELECT * FROM emp


WHERE empno=:bind_var;

GUIDELINE 12:

12. CREATE INDEXES ON COLUMNS, WHICH ARE OFTEN USED FOR


SORTING PURPOSES.
Use of Indexes should be followed while writing queries for sorting the data. Consider the
following situation where in the first query there is no index on EMPNO and the query
sorts the rows on the EMPNO.

Query 1 (without index on EMPNO)

SELECT empno FROM emp


ORDER BY empno DESC;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 7906
SORT ORDER BY 7906
TABLE ACCESS FULL EMP 822

Query 2 (with index on EMPNO)

SELECT empno FROM emp


ORDER BY empno DESC;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ----------------------------- ----------
SELECT STATEMENT 26
INDEX FULL SCAN EMP _EMPNO_IDX 26

Inference

In the above example, Oracle does a full table scan for the table, as it does not find any
index for the EMPNO column. The cost incurred here is whopping. After the index is
created, Oracle does a full scan of the index and the cost incurred is reduced by more
than 99%.

GUIDELINE 13:

13. USE EXISTS IN PLACE OF JOINS WHERE THE NUMBER OF


SUCCESSFUL ROWS RETURNED FROM THE DRIVING TABLE IS SMALL.

All employees who have a valid department ID are to be displayed (i.e. their
corresponding DEPTNO must exist in the dept table). Here a primary key index is present
in the DEPTNO column of the DEPT table.

NOTE: - The situation to run this query may occur if the relational constraints are not in
place.

Query 1
SELECT E.empno,E.ename,D.deptno
FROM emp E,dept D
WHERE E.deptno = D.deptno;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 822
NESTED LOOPS 822
TABLE ACCESS FULL EMP 822
INDEX UNIQUE SCAN DEPT_PRIMARY_KEY

Query 2

SELECT E.empno,E.ename,E.deptno
FROM emp E
WHERE EXISTS (SELECT deptno
FROM dept D
WHERE D.deptno = E.deptno);

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 822
FILTER
TABLE ACCESS FULL EMP 822
INDEX UNIQUE SCAN DEPT_PRIMARY_KEY 1

Inference

The cost of processing the query is reduced by almost 33% when EXISTS is used in the
query instead of a join condition. Consider using EXISTS instead of joining the tables if
the percentage of successful rows returned from the driving table (i.e. the number of rows
that need to be validated against the subquery) is small. In such a case, a table join will
be inefficient.

GUIDELINE 14:

14. USE EITHER ROWID OR AN INDEXED COLUMN IN COUNT FUNCTION.

The number of employees in the EMP table is to be found.

Query 1

SELECT COUNT(job)
FROM emp;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 822
SORT AGGREGATE
TABLE ACCESS FULL EMP 822

Query 2

SELECT COUNT(ROWID) FROM emp;

or

SELECT COUNT(empno) FROM emp;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 4
SORT AGGREGATE
INDEX FAST FULL SCAN EMP_EMPNO_IDX 4

Inference

The estimated processing costs are reduced by more than 99% in the second case.While
using the COUNT function in the select statement make sure that you use either ROWID
or the indexed column, using these the performance is increased and the cost incurred is
low. This is because if COUNT is made on a non-indexed column, the Oracle optimizer
will opt for a FULL table scan rather than use an index existing on some other
column.Also if COUNT(*) is used Oracle will first resolve the columns of the table and
then decide on the appropriate path for the query.

GUIDELINE 15:

15. USE DECODE WHEREVER POSSIBLE.

The employee details along with their respective department names are to be
displayed. This theory may only be feasible if the second table has a very small number
of rows.

Query 1

SELECT E.empno,E.ename,D.dname DEPT_NAME


FROM emp E,dept D
WHERE E.deptno=D.deptno;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 1100
HASH JOIN 1100
TABLE ACCESS FULL DEPT 1
TABLE ACCESS FULL EMP 822
Query 2

SELECT E.empno, E.ename,


DECODE(E.deptno,10,'ACCOUNTING',20,'RESEARCH',30,'SALES','OPERATIO
NS') DEPT_NAME
FROM emp E;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 822
TABLE ACCESS FULL EMP 822

Inference

The reduction in the estimated cost is almost 50% in the second case.Use DECODE
statement to reduce processing and to avoid having to scan the same rows repetitively.
This increases the performance and reduces the cost incurred. This method however is
feasible only if the second table is small.

GUIDELINE 16:

16. USE PARTITIONS FOR LARGE TABLES FOR FASTER RETRIEVAL OF


DATA.

Select the EMPNO, ENAME, DNAME from the EMP and DEPT tables where the deptno
is 10 or 30. This example demonstrates the efficiency of paritioning large tables.

Query 1

SELECT E.empno,E.ename,D.deptno,D.dname
FROM emp E,dept D
WHERE E.deptno=D.deptno AND E.deptno IN (10,30);

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 1047
HASH JOIN 1047
TABLE ACCESS FULL DEPT 1
TABLE ACCESS FULL EMP 822

Query 2

SELECT E.empno,E.ename,D.deptno,D.dname
FROM emp_part E,dept D
WHERE E.deptno=D.deptno AND E.deptno IN (10,30);
OPERATION OPTIONS OBJECT_NAME COST
------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 851
HASH JOIN 851
TABLE ACCESS FULL DEPT 1
PARTITION RANGE INLIST
TABLE ACCESS FULL EMP_PART 612

Inference

The cost comes down by 20% when a table is partitioned.Large tables can be partitioned
based either on RANGE or HASH. Range partitioning is useful when the table data can
be distributed among many logical ranges based on ranges of a particular column.

In the above example, EMP_PART is the partitioned table, which is


partitioned by range based on the column DEPTNO into four partitions. The cost is
reduced when the partitioned table is used since the rows are retrieved in parallel from
the partitions. But such a feature is useful only if tables are very large and their selectivity
is done more often on the partitioned column. Also, the number of partitions should not be
too high. This performance can be further improved by incorporating partitioned indexes.
This use of this feature is very optimal in case of warehouse Databases. Click here for
syntax of creation of partitions.

GUIDELINE 17:

17. USE INDEX-ORGANIZED TABLES FOR QUERIES INVOLVING EXACT


MATCH AND RANGE SEARCHES ESPECIALLY ON THE PRIMARY KEY.

Select the employee details of all employees having EMPNO's less than
1000.

Query 1

SELECT empno,ename
FROM emp
WHERE empno < 1000;

OPERATION OPTIONS OBJECT_NAME COST


------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 822
TABLE ACCESS FULL EMP 822

Query 2

SELECT empno,ename
FROM empi
WHERE empno < 1000;
OPERATION OPTIONS OBJECT_NAME COST
------------------------------ ------------------------------ ------------------------------ ----------
SELECT STATEMENT 10
INDEX RANGE SCAN EMPI_EMPNO_PK 2

Inference

The processing costs are reduced by almost 99% in case of the second query. In the
above example, both EMP and EMPI have 9.5 lac rows with the same data and have
EMPNO as the primary key. The only difference being that the EMPI table is an INDEX
ORGANIZED TABLE.
Click here for syntax and information on Index Organized tables.

Addendum

Note1

1. Hash Joins:

Hash joins are used for joining large data sets. The optimizer uses the smaller of the two
tables/data sources to build a hash table on the join key in memory.It then scans the
larger table, probing the hash table to find the joined rows.This operation involves joining
two sets of rows and returning the result. Apart from this the result needs to be SORTED
for unique values i.e. an operation involving sorting a set of rows to eliminate duplicates.If
the table is small enough to fit into the memory,then the cost is limited to a single read
pass over the data for the two tables.But if,the hash table is too big to fit in to the
memory,optimizer chooses partitioning.

When the Optimizer Uses Hash Joins:


The optimizer uses a hash join to join two tables if they are joined using an equijoinand if
either of the following are true:
1. A large amount of data needs to be joined.
2. A large fraction of the table needs to be joined. Hash joins Example:

[BACK]

Note2

2. Clusters:

You can create a CLUSTER as follows,


Suppose you want to create a cluster named EMP_DEPT for EMP and
DEPT tables.

1) First create a cluster named EMP_DEPT.


CREATE CLUSTER emp_dept(deptno number(2));

2) Now create the cluster tables as follows,

CREATE TABLE dept (


deptno NUMBER(2) PRIMARY KEY, . . . )
CLUSTER emp_dept (deptno);
CREATE TABLE emp (
empno NUMBER(5) PRIMARY KEY,
ename VARCHAR2(15) NOT NULL,
...
deptno NUMBER(2) REFERENCES dept)
CLUSTER emp_dept (deptno);

In this cluster DEPTNO is the cluster key. Be sure to choose your cluster key carefully
and it should represent the join condition between the cluster tables.

Some Points to remember about Clusters:

(i) Clusters should be used for tables predominantly used in join queries, they help in
increasing the performance. They should be preferred in Data Warehouse Environments
where the data in the cluster tables is not likely to change often.

(ii) Clusters can degrade the performance of your database when used in OLTP
(Online Transaction Processing) environments where the data in the cluster tables is
likely to be modified often.

You are advised make use clusters only after making a proper evaluation of the
environment your application will be working in.

[BACK]

Note3

3. Function Based Indexes:

Function Based Index can be created as shown below:

CREATE INDEX sal_comm_finx ON EMP ( sal * nvl(comm,0));

Function-based indexes facilitate queries that qualify a value returned by a


function or expression. The value of the function or expression is pre-computed and
stored in the index. Specific features of function-based indexing include:
(1) Indexes can be created where the search key is an expression.

(2) They provide an efficient mechanism for evaluating predicates involving functions.

(3) Case-insensitive sorts can be Performed.

(4) Descending order indexes can be created. They are treated as a special case of
function-based indexes.

[BACK]

Note4

4. Partitioned Tables:

Partitioned tables can be created as follows.

CREATE TABLE sales ( invoice_no NUMBER,


sale_year INT NOT NULL,
sale_month INT NOT NULL,
sale_day INT NOT NULL )
STORAGE (INITIAL 100K NEXT 50K)
PARTITION BY RANGE ( sale_year, sale_month, sale_day)
( PARTITION sales_q1 VALUES LESS THAN ( 1999, 04, 01 ) TABLESPACE tsa ,
PARTITION sales_q2 VALUES LESS THAN ( 1999, 07, 01 ) TABLESPACE tsb,
PARTITION sales_q3 VALUES LESS THAN ( 1999, 10, 01 ) TABLESPACE tsc,
PARTITION sales q4 VALUES LESS THAN ( 2000, 01, 01 ) TABLESPACE tsd) ;

Partitioned tables allow your data to be broken down into smaller, more
manageable pieces called partitions, or even subpartitions. Indexes may be
partitioned in similar fashion. Each partition can be managed individually, and can
operate independently of the other partitions, thus providing a structure that can
be better tuned for availability and performance.

[BACK]

Note5

5. Index-Organized Tables:

You can create index-organized tables in the following way.

CREATE TABLE docindex( token CHAR(20),


doc_id NUMBER,
token_frequency NUMBER,
token_offsets VARCHAR2(512),
CONSTRAINT pk_docindex PRIMARY KEY (token, doc_id))
ORGANIZATION INDEX TABLESPACE ind_tbs;

Index-organized tables are like regular tables with a primary key index on
one or more of its columns. However, instead of maintaining two separate storage spaces
for the table and B*tree index, an index-organized table only maintains a single B*tree
index containing the primary key of the table and other column values.
Index-organized tables provide fast key-based access to table data for
queries involving exact match and range searches especially on the primary key as seen
above. Changes to the table data (such as adding new rows, updating rows, or deleting
rows) result only in updating the index structure (because there is no separate table
storage area). Index-organized tables are suitable for accessing data by way of primary
key or any key that is a valid prefix of the primary key. There is no duplication of key
values and storage requirements are reduced because a separate index structure
containing the key values and ROWID is not created.

[BACK]

You might also like