SQL Tuning Workshop v1
SQL Tuning Workshop v1
D12395GC10
Production 1.0
January 2002
D34338
Authors Copyright © Oracle Corporation, 1998, 1999, 2000, 2001, 2002. All rights reserved.
Nancy Greenberg This documentation contains proprietary information of Oracle Corporation. It is provided
under a license agreement containing restrictions on use and disclosure and is also
Priya Vennapusa
protected by copyright law. Reverse engineering of the software is prohibited. If this
documentation is delivered to a U.S. Government Agency of the Department of Defense,
then it is delivered with Restricted Rights and the following legend is applicable:
Technical Contributors
Restricted Rights Legend
and Reviewers
Use, duplication or disclosure by the Government is subject to restrictions for commercial
Howard Bradley computer software and shall be deemed to be Restricted Rights software under Federal
Laszlo Czinkoczki law, as set forth in subparagraph (c)(1)(ii) of DFARS 252.227-7013, Rights in Technical
Dan Gabel Data and Computer Software (October 1988).
Connie Dialeris Green This material or any portion of it may not be copied in any form or by any means without
John Hibbard the express prior written permission of Oracle Corporation. Any other copying is a
Lilian Hobbs violation of copyright law and may result in civil and/or criminal penalties.
John Hoff
If this documentation is delivered to a U.S. Government Agency not within the
Alexander Hunold Department of Defense, then it is delivered with “Restricted Rights,” as defined in FAR
Tamas Kerepes 52.227-14, Rights in Data-General, including Alternate III (June 1987).
Susan Kotsovolos
Herve Lejeune The information in this document is subject to change without notice. If you find any
problems in the documentation, please report them in writing to Education Products,
Stefan Lindblad Oracle Corporation, 500 Oracle Parkway, Box SB-6, Redwood Shores, CA 94065. Oracle
Diana Lorentz Corporation does not warrant that this document is error-free.
Howard Ostrow
All references to Oracle and Oracle products are trademarks or registered trademarks of
Arjan Pellenkoft
Oracle Corporation.
Stacey Procter
Shankar Raman All other products or company names are used for identification purposes only, and may
Mariajesus Senise be trademarks of their respective owners.
Janet Stern
Don Sullivan
Ric Van Dyke
Lachlan Williams
Publisher
Shane Mattimoe
Contents
Instructor Preface
iii
3 EXPLAIN and AUTOTRACE
Objectives 3-2
Creating the Plan Table 3-3
The EXPLAIN PLAN Command 3-5
EXPLAIN PLAN Example 3-6
Displaying the Execution Plan 3-7
Interpreting the Execution Plan 3-9
Using V$SQL_PLAN 3-11
V$SQL_PLAN Columns 3-12
Querying V$SQL_PLAN 3-14
SQL*Plus AUTOTRACE 3-15
SQL*Plus AUTOTRACE Examples 3-16
SQL*Plus AUTOTRACE Statistics 3-17
Summary 3-18
Practice Overview 3-19
4 SQL Trace and TKPROF
Objectives 4-2
SQL Trace Facility 4-3
How to Use the SQL Trace Facility 4-4
Initialization Parameters 4-5
Switching On SQL Trace 4-7
Finding Your Trace Files 4-8
Formatting Your Trace Files 4-9
TKPROF Command Options 4-11
Output of the TKPROF Command 4-12
TKPROF Output Example: No Index 4-17
TKPROF Output Example: Unique Index 4-18
Some TKPROF Interpretation Pitfalls 4-19
Summary 4-20
Practice Overview 4-21
5 Rule-Based Optimization Versus Cost-Based Optimization
Objectives 5-2
Overview 5-3
Functions of the Oracle9i Optimizer 5-4
Rule-Based Optimization 5-5
Cost-Based Optimization 5-6
Choosing Between RBO and CBO 5-8
Setting the Optimizer Approach 5-9
First Rows Optimization 5-10
Rule-Based Optimization 5-11
RBO Ranking Scheme 5-12
Rule-Based Optimization Example 5-13
Influencing Rule-Based Optimization 5-14
iv
Summary 5-15
Practice Overview 5-16
Guided Practice Page 5-17
6 Indexes and Basic Access Methods
Objectives 6-2
ROWIDs 6-3
Indexes 6-4
B*-Tree Indexes 6-5
B*-Tree Index Structure 6-6
B*-Tree Index Example 6-7
CREATE INDEX Syntax 6-8
Composite Index Guidelines 6-9
Index Statistics 6-10
Effect of DML Operations on Indexes 6-11
Indexes and Constraints 6-12
Indexes and Foreign Keys 6-13
Basic Access Methods 6-14
Skip Scanning of Indexes 6-15
Identifying Unused Indexes 6-18
Enabling and Disabling the Monitoring of Index Usage 6-19
Clusters 6-20
Cluster Example 6-21
Index Clusters 6-22
Index Clusters: Performance Characteristics 6-23
Index Clusters: Limitations and Guidelines 6-24
Hash Clusters 6-25
Hash Clusters: Limitations 6-26
When to Use Clusters 6-27
Summary 6-29
7 Collecting Statistics
Objectives 7-2
The ANALYZE Command 7-3
Table Statistics 7-5
Index Statistics 7-7
Column Statistics 7-10
The DBMS_STATS Package 7-12
DBMS_STATS: Generating Statistics 7-13
Copy Statistics Between Databases 7-14
Example: Copying Statistics 7-15
Example: Gathering Statistics 7-16
Predicate Selectivity 7-17
Bind Variables and Predicate Selectivity 7-18
Histograms 7-20
v
Histograms and Selectivity 7-21
Histogram Statistics: Example 7-22
Histogram Tips 7-24
When to Use Histograms 7-25
Choosing a Sample Size 7-26
Choosing the Number of Buckets 7-27
Viewing Histogram Statistics 7-28
Summary 7-29
Practice Overview 7-30
8 Influencing the Optimizer
Objectives 8-2
Setting the Optimizer Mode 8-3
Some Additional Parameters 8-4
Optimizer Hint Syntax 8-6
Rules for Hints 8-7
Hint Recommendations 8-8
Optimizer Hint Example 8-9
Hint Categories 8-10
Basic Access Path Hints 8-11
Advanced Access Path Hints 8-13
Buffer Cache Hints 8-14
Hints and Views 8-15
View Processing Hints 8-17
Summary 8-18
Practice Overview 8-19
9 Sorting and Joining
Objectives 9-2
Tuning Sort Performance 9-3
Top-N SQL 9-4
Join Terminology 9-5
Join Operations 9-7
Nested Loops Joins 9-8
Nested Loops Join Plan 9-9
Sort/Merge Joins 9-10
Sort/Merge Join Plan 9-11
Hash Joins 9-12
Hash Join Plan 9-13
Joining Multiple Tables 9-14
Outer Joins 9-15
SQL: 1999 Outer Joins 9-16
Full Outer Joins 9-17
Execution of Outer Joins 9-18
The Optimizer and Joins 9-19
Join Order Rules 9-20
vi
RBO Join Optimization 9-21
CBO Join Optimization 9-23
Estimating Join Costs 9-24
Star Joins 9-25
Hints for Join Orders 9-27
Hints for Join Operations 9-28
Other Join Hints 9-30
Subqueries and Joins 9-31
Initialization Parameters that Influence Joins 9-33
Throwaway of Rows 9-34
Minimize Throwaway of Rows 9-35
Minimize Processing 9-36
Summary 9-37
10 Optimizer Plan Stability
Objectives 10-2
Optimizer Plan Stability 10-3
Plan Equivalence 10-4
Creating Stored Outlines 10-5
Using Stored Outlines 10-6
Data Dictionary Information 10-7
Execution Plan Logic 10-8
Maintaining Stored Outlines 10-9
Outline Editing Overview 10-11
Editable Attributes 10-13
Outline Cloning 10-14
Outline: Administration and Security 10-15
Configuration Parameters 10-17
Create Outline Syntax 10-18
Outline Cloning Examples 10-19
Summary 10-22
Practice Overview 10-23
11 Advanced Indexes
Objectives 11-2
Bitmapped Indexes 11-3
Bitmapped Index Structure 11-4
Creating Bitmapped Indexes 11-5
Using Bitmapped Indexes for Queries 11-7
Combining Bitmapped Indexes 11-8
When to Use Bitmapped Indexes 11-9
Advantages of Bitmapped Indexes 11-10
Bitmapped Index Guidelines 11-11
vii
What Is a Bitmap Join Index? 11-12
Bitmap Join Index: Advantages and Disadvantages 11-14
Indexes and Row-Access Methods 11-16
Index Hints 11-17
INDEX_COMBINE Hint Example 11-18
Star Transformation 11-20
Star Transformation Example 11-22
Function-Based Indexes 11-24
Function-Based Indexes: Usage 11-25
Data Dictionary Information 11-26
Summary 11-27
12 Materialized Views and Temporary Tables
Objectives 12-2
Materialized Views 12-3
Create Materialized Views 12-4
Refresh Materialized Views 12-5
Materialized Views: Manual Refresh 12-7
Query Rewrites 12-8
Create Materialized Views: Syntax Options 12-11
Enabling and Controlling Query Rewrites 12-12
Query Rewrite Example 12-13
Dimensions: Overview 12-15
Dimensions and Hierarchies 12-16
Dimensions: Example Table 12-17
Dimensions and Hierarchies 12-18
Create Dimensions and Hierarchies 12-19
Dimensions Based on Multiple Tables 12-20
Dimensions with Multiple Hierarchies 12-21
Temporary Tables 12-22
Creating Temporary Tables 12-24
Summary 12-26
13 Alternative Storage Techniques
Objectives 13-2
Storing User Data 13-3
Index-Organized Tables 13-4
IOT Performance Characteristics 13-5
IOT Limitations 13-6
When to Use Index-Organized Tables 13-7
Creating Index-Organized Tables 13-8
IOT Row Overflow 13-9
Retrieving IOT Information 13-10
External Tables 13-11
External Tables Performance Characteristics 13-12
viii
External Tables Limitations 13-13
Why Use External Tables? 13-14
Creating External Tables 13-15
Retrieving External Tables Information 13-16
Summary 13-17
Appendix A: Workshops
Index
ix
x
Instructor
Preface
Instructor Preface - 2
How This Book Is Organized
Unit 1: Introduction
Lesson Aim
Lesson 1: Following a Tuning This lesson covers the process of SQL statement tuning
Methodology including defining a tuning methodology. By applying a
tuning methodology, students learn to optimize
performance, improve response time and scalability.
Tuning is an ongoing process in the life cycle of a project
that needs to start in the development phase and
continue through the maintenance phase.
Lesson 2: SQL Statement Processing Students learn the four main stages in processing a SQL
statement: Parse, Bind, Execute, and Fetch. The
advantages and disadvantages of using bind variables are
discussed.
Lesson Aim
Lesson 3: EXPLAIN and AUTOTRACE This unit shows students the tools which are available to
help test and diagnose performance problems with a
SQL statement; this lesson shows the Explain command
and the SQL*Plus AUTOTRACE feature.
Lesson 4: SQL Trace and TKPROF This lesson shows the SQL Trace facility. By using SQL
Trace, students can evaluate the performance of SQL
statements. Trace files can be formatted with TKPROF.
Lesson Aim
Lesson 5: Rule-Based Optimization Students learn to use both rule-based and cost-based
Versus Cost-Based Optimization optimization. Rule-based optimization is still supported
for backward compatibility and uses a ranking scheme.
Cost-based optimization was introduced in Oracle7. The
optimizer uses statistics to calculate the cost of access
paths, taking into account the number of logical I/Os,
CPU usage, and network traffic.
Lesson 6: Indexes and Basic Access This lesson introduces students to B*-tree indexes. An
Methods index is a database object that is logically and physically
independent of the table data. The Oracle9i Server can
use an index to access data required by a SQL statement,
or use indexes to enforce integrity constraints.
Instructor Preface - 3
Unit 3: The Oracle Optimizer (continued)
Lesson Aim
Lesson 7: Collecting Statistics This lesson covers table statistics used with the cost-
based optimizer. By using the ANALYZE command or
DBMS_STATS package, students learn that they can
collect statistics on tables which are used by the cost-
based optimizer to determine optimal execution plans
for SQL statements. Histograms are covered too.
Lesson 8: Influencing the Optimizer This lesson covers how to influence the Oracle
optimizer at three levels: the instance level, the session
level, and the statement level (using hints).
Lesson 9: Sorting and Joining This lesson covers Top-N SQL, performance issues with
sorting, and how to optimize join statements.
Lesson 10: Optimizer Plan Stability This lesson covers the use of stored outlines and its
influence on optimizer plan stability, and the use of the
OUTLN_PKG package on stored outlines.
Lesson Aim
Lesson 11: Advanced Indexes Students learn about bitmapped indexes, star
transformations, and function-based indexes.
Lesson 12: Materialized Views and This lesson shows students materialized views, query
Temporary Tables rewrites, dimensions and hierarchies, and temporary
tables.
Lesson 13: Alternative Storage This lesson shows some alternative storage techniques
Techniques including index-organized tables, index clusters, and
hash clusters.
Lesson 14: Data Warehousing This lesson reviews the efficiencies introduced with the
Considerations Oracle9i Server that enhance performance in data
warehousing environments. The commands WITH,
multitable INSERT, and MERGE statements are
reviewed from a performance perspective.
Instructor Preface - 4
Following a Tuning Methodology
Objectives
Applying a tuning methodology to your design and tuning efforts optimizes performance.
Tuning SQL is a major component of a tuning methodology.
After completing this lesson, you should be able to:
• State the procedural steps in managing performance
• Describe the causes of performance problems
• Identify the main system areas that you can address by using the tuning process
• Describe the tuning methodology
• Explain the advantage of following the tuning methodology in its proper order
• List the tuning steps that are the responsibility of the application developer
• Performance management
• Performance problems
• The tuning methodology
• SQL statement tuning
• Methodology application
Overview
Anyone involved in tuning should follow a tuning methodology to achieve maximum
performance. Tuning SQL statements is an important step that is least expensive when
performed at its proper moment within the methodology.
In addition to tuning at the right time, you should also have a good understanding of the issues
involved in performance management and the types of performance problems that might arise.
• Start early
• Set objectives
• Tune and monitor conformance
• Work together
• Handle exceptions and changes
Managing Performance
Tuning requires several steps.
Start Early
Performance management must span the application or project continuously to be fully
effective. You should consider it at the design stage.
Set Objectives
Do not attempt to tune everything. Use the ROI (return on investment) methodology to identify
the portions of the application or project to tune. Define your objectives in a form accepted and
agreed to by all the interested parties. Service level agreements (SLAs) form an increasingly
important part of setting the objectives of operations groups and facilities management teams.
You can successfully extend this concept into the specification of application requirements.
It is better to set a measurable objective such as “this report will complete in 5 seconds or less”
than to aim for “all queries will be faster.”
Tune and Monitor Conformance
After you have set out and agreed upon the objectives, you are ready to tune and to reach those
objectives, monitoring your progress as you tune. You should keep detailed records about the
level of conformance with the requirements. You should publish indicative measures at regular
intervals, highlighting any deviations or trends.
• Schema
– Data design
– Indexes
• Application
– SQL statements
– Procedural code
• Instance
• Database
• User expectations
Instructor Note
Point out to the participants that:
• Schema design, as covered in Oracle9i Database Fundamentals for Developers, is a
suggested prerequisite for this course.
• Tuning SQL statements is the focus of this course.
• Tuning procedural code (PL/SQL, 3GLs) is part of learning that particular coding
environment.
• Instance and database tuning are discussed in the Oracle9i Performance Tuning course.
Refer to them briefly, when necessary.
Instructor Note
Before turning to the next page, you could ask for suggestions about performance
dependencies and write them on the board.
Critical Resource
Response time is defined as the service time plus the wait time to accomplish a certain task.
As demand for a resource with a single server increases toward the service rate, queues build up
with the queue length, increasing exponentially for each increase in demand.
Even if there are many servers, you can observe the same effect with a single queue. However,
multiple servers provide a valuable smoothing effect when there is a wide variation in the time
for which a resource is occupied by one of its clients before it is available for reallocation.
The goal is to design, engineer, and tune the system so that the load is never permitted to slow
service completion times below the acceptable level.
Excessive Demand
Throughput is defined as the total amount of work accomplished by the system in a given
amount of time. Too many processes using a system simultaneously may result in the
following symptoms.
Increased Response Time
Most users know and understand the effects of queues on increasing response time. They may
be prepared to accept slower response at peak times if the effect is linear. However, both
statistical theory and experience show that when response time starts to deteriorate, small
increases in load can have a severe effect, which is unacceptable to users.
Scalability
Scalability is a system’s ability to process more workload, with a proportional increase in
system resource usage. In other words, in a scalable system, if you double the workload, then
the system must use twice as many system resources.
Examples of bad scalability due to resource conflicts include the following:
• Applications requiring significant concurrency management as user populations increase
• Increased locking activities
• Increased data consistency workload
• Increased operating system workload
• Transactions requiring increases in data access as data volumes increase
• Poor SQL and index design resulting in a higher number of logical I/Os for the same
number of rows returned
• Reduced availability because database objects take longer to maintain
Understanding Scalability
If an application exhausts a system resource to the point where no more throughput is possible
when its workload is increased it is said to be unscalable. This can result in fixed throughputs
and poor response times.
Internet Scalability
Applications must scale with the increase in workload and also when additional hardware is
added to support increasing demand. Design errors can cause the implementation to reach its
maximum, regardless of additional hardware resources or redesign efforts.
Internet applications are challenged by very short development timeframes with limited time
for testing and evaluation. A badly designed application generally means that at some point in
the future, the system will need to be rearchitected or reimplemented. From a business
perspective, poor performance can mean a loss of customers. If a Web user does not get a
response in seven seconds, then the user’s attention could be lost forever.
In many cases, the cost of redesigning a system with the associated down time costs in
migrating to new implementations exceeds the costs of properly building the original system.
It is critical to design and implement with scalability in mind from the start.
Tuning Roles
The business analyst, designer, application developer, database administrator, and operating
system administrator are responsible for different steps in the tuning process. In some cases, one
person may fulfill several of these roles.
The steps performed by the database administrator and operating system administrator have a
lesser effect on performance than earlier steps, but they can be performed at relatively low cost
with immediately available and observable results.
The fourth step in the methodology is primarily the responsibility of the application developer.
However, understanding how to tune SQL can enable designers to design schemas that tune
easily. With this understanding, database administrators can help pinpoint SQL tuning needs
and solutions, thus easing the burden on their databases. This is especially useful if the
application is already in production and the application developer is no longer available.
A large part of tuning in client-server or Web environments relates to network issues, as well.
Quantifiable Objectives
Establish quantifiable objectives and avoid vague ones. An example of an acceptable objective
would be the following:
“We must be able to support up to 20 operators, each entering 20 orders per hour, and the
picking lists must be produced within 30 minutes of the end of the shift.”
Minimum Repeatable Test
If you are trying to cut a four-hour run down to two hours, repeated timings take too long.
Perform your initial trials against a test environment similar to the real one (or provide input
for a restrictive condition, such as processing one department instead of 500 of them). The
ideal test case should run for more than one minute so that improvements are demonstrated
intuitively and can be measured using timing features.
• Ask questions of affected users and avoid preconceptions.
• Test hypotheses and keep notes.
• Stop when you meet the target.
Summary
This lesson introduced you to following a tuning methodology. By applying a tuning
methodology, you can optimize performance, and improve response time and scalability.
Tuning is an on going process in the life cycle of a project that must start in the development
phase and continue through the maintenance phase.
• Manage performance at an early stage
• Identify performance problems
• Determine tuning roles
• Tune your SQL statements
• Apply the tuning methodology
Summary (continued)
Tuning SQL Statements
Tuning SQL statements involves using analysis tools, SQL tuning techniques, and the optimizer
effectively. In addition, tuning the schema can create additional access paths for the optimizer,
such as the use of an index.
Objectives
Knowing how the Oracle9i Server processes SQL statements can help you design SQL
statements that can reuse parsing and optimization efforts.
After completing this lesson, you should be able to:
• Describe the basic steps involved in processing a SQL statement
• Monitor the use of shared SQL areas
• Write a SQL statement to take advantage of shared SQL areas (This will enable you to
reuse parsing and optimization efforts).
• Understand how and when to set the CURSOR_SHARING parameter to SIMILAR or
FORCE.
• Use the automatic PGA memory management.
Overview
By knowing about shared SQL areas, the SQL processing phases, and shared cursors, you can
understand how coding standards enable you to minimize how often your statements must be
parsed and optimized.
Additionally, as the statements you write become more and more standardized, you can
identify statements that occur frequently and devote additional tuning to them.
PGA
Contains:
• Data dictionary cache
• Library cache
– SQL statements
– Parsed or compiled PL/SQL blocks Application
Code
– Java classes
Shared SQL
SGA
PGA
A program global area (PGA) is a memory region containing data and control information for
a single process (server or background). Access to it is exclusive to that server process and is
read and written only by the Oracle code acting on behalf of it. PGA is a nonshared memory
area to which a process can write. One PGA is allocated for each server process. A PGA is
allocated by the Oracle server when a user connects to an Oracle database and a session is
created, though this varies by operating system and configuration.
An example of such information is the run-time area of a cursor. Each time a cursor is
executed, a new run-time area is created for that cursor in the PGA. With complex queries, a
big portion of the run-time area is dedicated to work areas allocated by memory-intensive
operators such as sort-based operators. A sort operator uses a work area (the sort area) to
perform the in-memory sort of a set of rows
The size of the work area can be controlled and tuned. Ideally, the size of a work area is big
enough that it can accommodate the input data and auxiliary memory structures allocated by
its associated SQL operator. This is called the optimal size of a work area. When the size of
the work area is smaller than optimal, response time increases, because an extra pass is
performed over part of the input data. This is called the one-pass size of the work area. Under
a one-pass threshold, when the size of a work area is much too small compared to the input
data size, multiple passes over the input data are needed. This is called the multipass size of
the work area.
Oracle9i: SQL Tuning Workshop 2-7
SQL Statement Processing Phases
Open Close
Processing Phases
The four most important phases in SQL statement processing are parsing, binding, executing,
and fetching.
The reverse arrows indicate processing scenarios; for example, Fetch—(Re)Bind—Execute—
Fetch.
The Fetch phase applies only to queries and DML statements with a returning clause.
Note: A detailed description of SQL statement processing can be found in the Oracle9i
Application Developers Guide - Fundamentals, “How Oracle Processes SQL Statements” and
the Oracle9i Concepts, “SQL and PL/SQL.”
Instructor Note
There is no real Rebind phase; the same pointer is simply used for a re-Execute. The value
found at the corresponding memory address is used; Oracle9i uses a “bind by reference”
technique.
Three more phases are missing from this slide: Define, Describe, and Parallelize.
Define and Describe are not very important for tuning. Parallelize is not mentioned here
because it is discussed in the Oracle9i: Implementing Scalable Systems course.
• Parse
– Searches for the statement in the shared pool
– Checks syntax
– Checks semantics and privileges
– Merges view definitions and subqueries
– Determines execution plan
Parse Phase
The Oracle9i Server does the following:
• Searches for the statement in the shared pool
• Checks the statement syntax, given the grammar and specifications of the SQL language
• Checks the semantics, ensuring that objects referenced in the SQL statement are valid
and satisfy security constraints
• Determines whether the process issuing the statement has appropriate privileges to
execute it
• Transforms a SQL statement on a view into an equivalent SQL statement on its
underlying definition, and attempts to simplify a statement with a subquery by rewriting
it into a join
• Determines and stores the execution plan, or uses an existing execution plan, if possible
Note: The Parse phase also checks whether materialized views can be used. Materialized
views are discussed in a subsequent lesson.
Instructor Note
Determining the execution plan is the most costly part of the parsing and may take longer than
executing the plan.
• Bind
– Scans the statement for bind variables
– Assigns (or reassigns) a value
Bind Phase
• The Oracle9i Server checks the statement for references of bind variables.
• The Oracle9i Server assigns or reassigns a value to each variable.
Note: This phase order implies that the Oracle9i Server does not know bind variable values
when optimizing a statement. This enables a fast rebind-execute without the need for
reparsing, thus saving time and memory; a disadvantage is that it is impossible for the
optimizer to estimate predicate selectivity. This will be discussed in more detail in the
“Collecting Statistics” lesson.
Instructor Note (for page 2-11)
Note the subtle difference between sorts for DML statements (during the Execute phase) and
queries (during the first fetch). There is no need to stress this, but students might ask.
With an INSERT, UPDATE, or DELETE statement, executing the statement results in the
SQL being run immediately during the Execute phase.
When a SELECT statement where rows are sorted or locked is issued, opening the cursor
retrieves all of the rows to be returned. If there is no locking or sorting, opening the cursor
locates the record pointer at the first row.
• Execute
– Applies the execution plan
– Performs necessary I/O and sorts for data
manipulation language (DML) statements
• Fetch
– Retrieves rows for a query
– Sorts for queries when needed
– Uses an array fetch mechanism
Execute Phase
• The Oracle9i Server applies the parse tree to the data buffers.
• Multiple users can share the same parse tree.
• The Oracle9i Server performs physical reads or logical reads/writes for DML statements
and also sorts the data when needed.
Fetch Phase
The Oracle9i Server retrieves rows for a SELECT statement during the Fetch phase. Each
fetch typically retrieves multiple rows, using an array fetch.
Each Oracle tool offers its own ways of influencing the array size; in SQL*Plus you do so by
using the ARRAYSIZE setting:
SQL> show arraysize
arraysize 15
SQL> set arraysize 1
With this setting, SQL*Plus will process one row at a time. The default value is 15.
• Reduces parsing
• Dynamically adjusts memory
• Improves memory usage
Bind Variables
If two bind variables have different data types, then the statements are not identical. If the
bind-variable data types match but their names are not identical, as in the example above,
there is no problem, because bind variables will be renamed internally. The first variable is
always called :b1, the second is :b2, and so on.
Instructor Note
Demo scripts: demo02_01.sql
demo02_01cursors.sql
Note that the script actually uses the PL/SQL environment of SQL*Plus. SQL*Plus itself does
not translate bind-variable references in SQL statements.
Note that starting with version 8.1.6, there is a powerful feature to force cursor sharing, which
you can use to set CURSOR_SHARING=FORCE at the session or instance level. This will
replace all literals in your SQL statements by system-generated bind variables. This option is
discussed at the end of the lesson.
select * from 1 1 0 1 0
customers where
cust_id = 180
Cursor Sharing
The parsing phase compares the text of the statement with existing statements in the shared
pool to see whether the statement can be shared. If the statement differs textually in any way,
then Oracle does not share the statement.
The CURSOR_SHARING parameter enables the DBA to specify the level of matching that
will be considered acceptable by the SQL parser. The default value is EXACT, which forces
the parser to seek a statement that exactly matches the current statement before reusing an
existing cursor.
Setting CURSOR_SHARING to SIMILAR allows the parser to use a cursor for a statement
that is identical in all aspects other than literal values. After the statement is identified, the
current statement will continue to be parsed to ensure that the cursor’s execution plan is
applicable to the current statement. CURSOR_SHARING = FORCE makes the parser behave
the same way as for SIMILAR, except the execution plan in the cursor is always used,
regardless of its applicability.
CURSOR_SHARING = SIMILAR or FORCE is not recommended for certain environments
such as DSS or for complex queries.
See the Oracle9i Database Performance Guide and Reference for more information.
Summary
Use identical SQL statements to reuse SQL areas and thus increase performance.
Realistically, this will be easiest if you use shared code such as packaged procedures, because
the same code will be executed multiple times. This also increases maintainability of the code
and helps narrow your tuning efforts to one set of shared code instead of multiple, scattered
similar SQL statements.
Processing Phases
There are four main stages in processing a SQL statement:
• In the Parse stage, the Oracle9i Server checks the syntax of the statement, determines the
execution plan if the statement is not already in the shared SQL area, and verifies the
information in the data dictionary.
• In the Bind stage, any unresolved bind variables are assigned a value.
• In the Execute stage, the Oracle9i Server executes the statement performing required
reads and writes (except for queries).
• In the Fetch stage (for queries only), rows are retrieved using an array fetch mechanism.
Practice Overview
In this practice, use the data dictionary views to analyze cache management:
• Use the V$LIBRARYCACHE view to analyze library cache performance
• Use the V$SQLAREA view to see information about all shared cursors in the cache
Instructor Note
This practice session is optional. Use it when you want to have a break between the initial
(theoretical) lessons. Usually, you should try to go through the first seven lessons as quickly
as possible, so that the first workshop can be done. During the three days of this class, the
theory-to-practice ratio changes considerably.
Objectives
You can use various analysis tools to evaluate the performance of SQL statements.
After completing this lesson, you should be able to:
• Create the PLAN_TABLE by using the utlxplan.sql script
• Use the EXPLAIN PLAN command to show how a statement is processed
• Query the V$SQL_PLAN performance view to examine the execution plan for cursors
that were recently executed.
• Use the SQL*Plus AUTOTRACE setting to show SQL execution plans and statistics
EXPLAIN PLAN
SET STATEMENT_ID
= ’text’
FOR statement
Field Meaning
text This is an optional identifier for the
statement. You should enter a value
to identify each statement so that you can
later specify the statement that you want
explained. This is especially important
when you share the plan
table with others, or when you keep
multiple execution plans in the same
plan table.
schema.table This is the optional name of the output
table. The default is PLAN_TABLE.
Explained.
SQL> select id
2 , lpad(’ ’, 2*level)||operation
3 ||decode(id,0,’ Cost = ’||position)
4 ||’ ’||options
5 ||’ ’||object_name as "Query Plan"
6 from plan_table
7 where statement_id = ’demo01’
8 connect by prior id = parent_id
9 start with id = 0;
SQL> select id
2 , lpad(’ ’,2*level)||operation||
3 decode(id, 0,’ Cost = ’||position)
4 ||’ ’||options
5 ||’ ’||object_name as "Query Plan"
6 from plan_table
7 where statement_id = ’demo01’
8 connect by prior id = parent_id start with id=0;
ID Query Plan
----- -----------------------------------------
0 SELECT STATEMENT Cost =
1 TABLE ACCESS BY INDEX ROWID PRODUCTS
2 AND-EQUAL
3 INDEX RANGE SCAN PRODUCTS_PROD_CAT_IX
4 INDEX RANGE SCAN PRODUCTS_PROD_SUBCAT_IX
TABLE ACCESS
1 (BY INDEX ROWID)
classes
AND_EQUAL
2
INDEX INDEX
(RANGE_SCAN) 3 4 (RANGE_SCAN)
Prod category Prod subcategory
index index
Using V$SQL_PLAN
This view provides a way of examining the execution plan for cursors that were recently
executed.
The information in this view is very similar to the output of an EXPLAIN PLAN statement.
However, EXPLAIN PLAN shows a theoretical plan that can be used if this statement were
to be executed, whereas V$SQL_PLAN contains the actual plan used. The execution plan
obtained by the EXPLAIN PLAN statement can be different from the execution plan used to
execute the cursor, because the cursor might have been compiled with different values of
session parameters (for example, HASH_AREA_SIZE).
V$SQL_PLAN shows the plan for a cursor, not for a SQL statement. The difference is that a
SQL statement can have more than one cursor associated with it, with each cursor further
identified by a CHILD_NUMBER. For example the same statement executed by different
users will have different cursors associated with it if the object being referenced is in a
different schema. Similarly different hints can cause different cursors. The V$SQL_PLAN
table can be used to see the different plans for different child cursors of the same statement.
V$SQL_PLAN Columns
The view contains almost all PLAN_TABLE columns, in addition to new columns. The
columns that are also present in the PLAN_TABLE have the same values:
• ADDRESS
• HASH_VALUE
The two columns ADDRESS and HASH_VALUE can be used to join with V$SQLAREA to add
the cursor-specific information.
The ADDRESS, HASH_VALUE and CHILD_NUMBER columns can be used to join with
V$SQL to add the child cursor-specific information.
Querying V$SQL_PLAN
The above statement shows the execution plan for the SQL SELECT statement shown
on page 3-6. Looking at the plan for a SQL statement is one of the first steps in tuning a SQL
statement. The SQL statement that is used to return the plan is identified by the statement’s
HASH_VALUE and ADDRESS from V$SQL_PLAN.
Note: For statements that use the rule-based approach, the COST column is null.
Instructor Note
These are a few examples of how a SQL statement can result in more than one cursor:
When the same table name resolves to two separate tables:
User1: SELECT * FROM CUSTOMERS;
User2: SELECT * FROM CUSTOMERS;
Where User2 has his or her own customer table, and User1 uses the table referenced by a
public synonym.
OFF
SET AUTOTRACE ON
TRACE[ONLY]
EXPLAIN
STATISTICS
SHOW AUTOTRACE
SQL*Plus AUTOTRACE
In SQL*Plus, you can automatically obtain the execution plan and some additional statistics
on the running of a SQL command by using the AUTOTRACE setting. Unlike the EXPLAIN
PLAN command, the statement is actually run, even if you choose to suppress the statement
output; thus you can display realistic statistics. AUTOTRACE, a feature available since Oracle
Server release 7.3, is an excellent diagnostic tool for SQL Statement Tuning. Because it is
purely declarative, it is easier to use than EXPLAIN PLAN.
Command Options
OFF Disables autotracing SQL statements
ON Enables autotracing SQL statements
TRACEONLY Enables autotracing SQL statements and suppresses statement output
EXPLAIN Displays execution plans but does not display statistics
STATISTICS Displays statistics but does not display execution plans
Note: If both command options EXPLAIN and STATISTICS are omitted, execution plans
and statistics are displayed by default.
Instructor Note (for page 3-16)
There are some more settings for the (optional) second part of AUTOTRACE output, for
parallel and distributed queries. However, they are not relevant at this point.
Statistics
---------------------------------------------------
0 recursive calls
2 db block gets
1 consistent gets
0 physical reads
0 redo size
367 bytes sent via SQL*Net to client
430 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
AUTOTRACE Statistics
AUTOTRACE displays several statistics, not all of them relevant to the discussion at this
stage. In the next lesson, when discussing SQL Trace and TKPROF, this course goes into
more detail about SQL statement execution statistics. The most important ones are the
following:
db block gets The number of logical I/Os for current gets
consistent gets The number of logical I/Os for read-consistent gets
physical reads The number of blocks read from disk
redo size The amount of redo generated (for DML statements)
sorts (memory) The number of sorts performed in memory
sorts (disk) The number of sorts performed using temporary disk storage
Note: Current gets are reads of the current block’s buffer cache; consistent gets are reads of
rollback segments due to changes in the buffer. The db block gets, consistent gets, and
physical reads are the three statistics that are usually monitored. These should be low
compared to the number of rows retrieved. Sorts should be performed in memory rather than
disk.
Instructor Note
Demo script: demo03_03.sql
See Instructor Note on page 3-19.
Oracle9i: SQL Tuning Workshop 3-17
Summary
Summary
Tuning involves careful analysis of execution plans and execution statistics. Several tools
are available to help you test and diagnose performance problems with a SQL statement:
• The PLAN_TABLE stores execution plans and can be created using the
utlxplan.sql script.
• EXPLAIN PLAN determines how the optimizer processes a statement (the statement
is not executed) and stores the execution plan in the PLAN_TABLE.
• You can also use the view V$SQL_PLAN to view the execution plan for recently
executed cursors.
• The SET AUTOTRACE command in SQL*Plus enables you to view the execution
plan and a variety of statistics about the execution whenever you execute a SQL
statement.
You can use several other tools to monitor performance, alerting you to the need to tune
SQL statements. Some of these tools will be covered in future lessons.
Practice Overview
In this practice, you create a PLAN_TABLE, analyze a SQL statement, and format the
analyzed results by using the rp.sql script. You also use the SQL*Plus AUTOTRACE
command to view execution plans.
Instructor Note (for page 3-17)
Demo script: demo03_03.sql
Note that consistent gets can be higher than expected due to array processing; each new array
needs one additional get. The number of array fetches equals (SQL*Net roundtrips - 2).
AUTOTRACE performs an implicit query on v$sesstat; see script demo03_04.sql for a
similar query on v$sesstat. This query shows the results for the session.
demo03_03.sql shows the results from the last statement.
Instructor Note (for the Practice)
The optimizer_mode parameter for the session should be set to rule to ensure that the
results are easier to interpret. At this point, the goal is to have the students use autotrace
and understand basic execution plans.
Instructor Note
This lesson introduces the SQL Trace utility and the associated initialization parameters.
It explains how to enable SQL Trace for a session and how to find your trace files. It also
covers the TKPROF command to format trace files.
Keep the practice session short; the students will have enough time to use SQL Trace and
TKPROF in the workshops.
Objectives
Objectives
You can use various analysis tools to evaluate the performance of SQL statements. After
completing this lesson, you should be able to:
• Configure SQL Trace to gather statistics
• Set up appropriate initialization parameters
• Format the trace statistics using the TKPROF utility
• Interpret the output of the TKPROF command
Background Reading
For more information about SQL Trace and TKPROF, see the following:
• Oracle9i Performance Guide and Reference chapters in this part are: “Using
EXPLAIN PLAN,”, “Using SQL Trace and TKPROF,” “Using Autotrace in
SQL*Plus,” “Using Oracle Trace”
• Oracle9i Performance Methods
• Oracle9i Application Developer’s Guide, “Using Procedures and Packages”
• Oracle9i Reference, “Initialization Parameters”
Trace
SQL Trace
file
Report
TKPROF
file
Database
Report
TKPROF
file
Database
TIMED_STATISTICS = {false|true}
MAX_DUMP_FILE_SIZE = {n|500}
USER_DUMP_DEST = directory_path
Initialization Parameters
Several initialization parameters relate to SQL Trace.
TIMED_STATISTICS
The SQL Trace facility provides a variety of information about processes, optionally including
timing information. If you want timing information, you must turn on this parameter. You can
do so for the entire database by setting the following initialization parameter in the parameter
file before starting up and opening the database:
TIMED_STATISTICS = TRUE
The parameter also can be set dynamically for a particular session with the following
command:
SQL> ALTER SESSION SET timed_statistics=true;
The timing statistics have a resolution of one-hundredth of a second. This means that any
operation that finishes quickly may not be timed accurately, such as simple queries that execute
quickly.
Having TIMED_STATISTICS turned on affects performance slightly because the Oracle
server must do some additional work. Therefore, this parameter is commonly turned off until
specifically desired for tuning purposes. However, keeping TIMED_STATISTICS turned on
could make trace files more useful for support engineers during a system crash.
Note: The underlined values in the slide indicate the default values for the parameters.
Oracle9i: SQL Tuning Workshop 4-5
Initialization Parameters (Continued)
MAX_DUMP_FILE_SIZE and USER_DUMP_DEST
These two parameters control the size and destination of the output file:
MAX_DUMP_FILE_SIZE = n
USER_DUMP_DEST = directory_path
The MAX_DUMP_FILE_SIZE default value is 500, and this parameter value is
expressed in operating system blocks. MAX_DUMP_FILE_SIZE can also be changed at
the session level by using the ALTER SESSION command.
The default USER_DUMP_DEST location varies by operating system; it typically is the
default destination for system dumps on your system. USER_DUMP_DEST is a system-
level parameter that cannot be changed at the session level. It can only be changed
dynamically by a database administrator using the ALTER SYSTEM command.
Obtain Information about Parameter Settings
You can display current parameter values by querying the V$PARAMETER view:
SQL> SELECT name, value
2 FROM v$parameter
3 WHERE name LIKE ’%dump%’;
Note: The directory path specifications that show up in the VALUE column are operating
system dependent. This view is accessible only by users with DBA privileges
Instructor Note
You can also use the show parameter <search string> command; all
commands that were specific for the Server Manager tool are now available in
SQL*Plus, provided you have the correct privileges.
If the MAX_DUMP_FILE_SIZE is reached, *** Trace File Full *** is written to the
trace file. No notification or error message is displayed, and the background process
stops writing to the trace file.
Note that there is an additional initialization parameter, SQL_TRACE, that influences
SQL Trace. However, it is not a parameter to use when preparing for SQL Trace, but
rather when enabling SQL Trace. That is why it is not on the previous slide but is
introduced on the next slide. You should discourage use of that parameter and suggest
enabling SQL Trace at the session level instead.
Note that when you are using SQL Trace in Multi-Threaded Server (MTS), the shared
server processes will create trace files in the BACKGROUND_DUMP_DEST, not in the
USER_DUMP_DEST directory. It is recommended to use a dedicated server process for
tuning.
SQL> @readtrace.sql
OS> tkprof
OS> tkprof ora_901.trc run1.txt
OS> tkprof ora_901.trc run2.txt sys=no
sort=execpu print=3
SORT = option
PRINT = n
EXPLAIN = username/password
INSERT = filename
SYS = NO
AGGREGATE = NO
RECORD = filename
TABLE = schema.tablename
SORT The order in which to sort the statements in the report (See previous page
for a list of values.)
PRINT Produces a report on this (sorted) number of statements only (This option
is especially useful in combination with the SORT option.)
EXPLAIN Logs on and executes EXPLAIN PLAN in the specified schema
INSERT Creates a SQL script to load the TKPROF results into a database table
SYS Disables the listing of recursive SQL statements, issued by the user SYS
AGGREGATE Disables or enables the (default) behavior of TKPROF, aggregating
identical SQL statements into one record
RECORD Creates a SQL script with all the nonrecursive SQL statements found in
the trace file (This script can be used to replay the tuning session later.)
TABLE Specifies the table to temporarily store execution plans before writing them
to the output file (This parameter is ignored if EXPLAIN is not specified.
It is useful when several individuals use TKPROF to tune the same
schema concurrently, to avoid destructive interference.)
Note: The PARSE value includes both hard and soft parses. A hard parse refers to the
development of the execution plan (including optimization); it is subsequently stored in the
library cache. A soft parse means that a SQL statement is sent for parsing to the kernel, but
the kernel finds it in the library cache, and only needs to verify things such as access rights.
Hard parses can be expensive, in particular due to the optimization; a soft parse is mostly
expensive in terms of library cache activity.
Recursive Calls
Sometimes in order to execute a SQL statement issued by a user, the Oracle server must
issue additional statements. Such statements are called recursive SQL statements. For
example, if you insert a row into a table that does not have enough space to hold that row,
the Oracle server makes recursive calls to allocate the space dynamically. Recursive calls
are also generated when data dictionary information is not available in the data dictionary
cache and must be retrieved from disk.
If recursive calls occur while the SQL trace facility is enabled, TKPROF marks them
clearly as recursive SQL statements in the output file. You can suppress the listing of
recursive calls in the output file by setting the SYS=NO command-line parameter. Note
that the statistics for recursive SQL statements are always included in the listing for the
SQL statement that caused the recursive call.
Library Cache Misses
TKPROF also lists the number of library cache misses resulting from parse and execute
steps for each SQL statement. These statistics appear on separate lines following the
tabular statistics.
Parsing User ID
This is the ID of the last user to parse the statement.
...
select cust_first_name, cust_last_name, cust_city, cust_state_province
from customers
where cust_last_name = ’Smith’
Summary
This lesson introduced you to the SQL Trace facility. By using SQL Trace, you can evaluate the
performance of SQL statements. After SQL Trace is turned on, you view execution plans and
statistics on SQL statements generated in a trace file.
To use SQL Trace:
• Identify the appropriate initialization parameters that you want to use
• Turn SQL Trace on
• Format the trace file that is generated by using TKPROF
• Interpret the output of TKPROF
Practice Overview
In this practice, you view and change initialization parameters that influence tracing SQL
statements. Then you use the SQL_TRACE facility to analyze a SQL statement. The results of
the analysis are written to a trace file. Find the trace file and use TKPROF to format the trace
statistics. Interpret the formatted trace results.
Objectives
After completing this lesson, you should be able to:
• Describe the functions of the Oracle9i optimizer
• Distinguish between RBO and CBO
• Identify how the optimizer chooses between RBO and CBO
• Identify the factors that CBO considers when it selects an execution plan
• Identify RBO behavior and the use of the RBO ranking scheme
• Influence RBO behavior
• Set the optimizer approach at the instance and session level
Instructor Note
This lesson introduces the Oracle9i optimizer and explains the differences between RBO and
CBO and the default behavior when choosing between RBO and CBO. This is the only lesson
that specifically addresses RBO behavior. Statistics collection and how to influence CBO are
covered later in the course.
Overview
There are various methods and access paths that can be used to execute a SQL statement. It is
the optimizer’s job to choose which ones to use.
• RBO decides based on a set of rules.
• CBO bases this choice on statistics that it holds about the tables.
This lesson explains how the Oracle optimizer decides between these two approaches.
To use cost-based optimization, you should collect statistics about the tables involved. This is
covered in a later lesson. Cost-based optimization can be influenced in several sophisticated
ways; this is also covered in a later lesson.
Although it is still possible to use rule-based optimization, you should not use this method.
This lesson shows how rule-based optimization uses a ranking scheme to produce execution
plans. You will also see coding techniques that have been popular in the past to influence rule-
based optimization.
Finally, this lesson details the Oracle9i optimizer’s four basic approaches, which can be set at
the instance and session level.
Rule-Based Optimization
Rule-based optimization is supported in Oracle9i, but only for compatibility reasons with
Oracle version 6 and earlier. If you still have existing OLTP applications developed and
carefully tuned using Oracle version 6, you might want to continue using rule-based
optimization when you upgrade these applications to Oracle9i.
Rule-based optimization is syntax driven, which means that changing the syntax of SQL
statements could change the performance. Be aware of the following:
• No statistics are used, so the optimizer does not know anything about the number of rows in
a table and the values in the table columns.
• No costs of execution plans are calculated; the decision about an execution plan is fully
based on a set of rules.
Note: When using diagnostic tools such as EXPLAIN and SQL*Plus AUTOTRACE, the rule-
based optimization approach is easy to recognize: the COST will always display a NULL value.
As soon as a COST value appears, you know that the cost-based optimizer approach is used.
Cost-Based Optimization
Oracle7 was the first release that implemented cost-based optimization. Unlike rule-based
optimization, the cost-based optimizer bases its decisions on cost estimates, which in turn are
based on statistics stored in the data dictionary.
In calculating the cost of execution plans, cost-based optimization considers the following:
• The number of logical reads (This is the most important factor.)
• CPU utilization
• Network transmissions
Cost-based optimization is continually enhanced with each new Oracle server release, and
several newer features—such as hash joins, star queries, histograms, and index-organized
tables—are available only to cost-based optimization.
With Oracle9i the estimated usage of CPU for SQL functions and operators is included in the
overall estimate of computer resources usage, together with disk I/O and memory, and is used
to compute the cost of access paths and join orders. Estimate of network usage is also included
when data is shipped between query servers running on different nodes.
The result is an improved accuracy of the cost and size models used by the query optimizer.
This helps the optimizer produce better execution plans, improving query performance.
OPTIMIZER_MODE =
• FIRST_ROWS_1
• FIRST_ROWS_10
• FIRST_ROWS_100
• FIRST_ROWS_1000
ALTER SESSION
SET OPTIMIZER_MODE = FIRST_ROWS_n
/*+ FIRST_ROWS(x) */
Rule-Based Optimization
Rule-based optimization always uses indexes when possible, even when tables are small or
when queries typically return a significant percentage of the rows, so that a full table scan is
faster. Rule-based optimization is not aware of statistics, such as the number of rows in a table
or the selectivity of a predicate.
Rule-based optimization uses a ranking scheme to decide which access path to the data to
choose.
Influencing RBO
You cannot influence rule-based optimization, except by creating indexes, dropping indexes, or
making indexes unusable by changing the statement syntax.
If the RBO finds two execution plans that score the same on the ranking scheme, it makes an
arbitrary decision. For example, suppose each available plan can use only one of two different
indexes. In that case the RBO decision is not documented, but dropping and recreating one of
the involved indexes will influence the RBO behavior: it will use the newly created index.
Thus, you should be aware that recreating a series of indexes in a different order will probably
influence the RBO behavior. Alternatively, you can change the SQL statement to make it
impossible for the RBO to use one of the indexes. This is discussed later in this lesson.
The lowest rank is 9, so the index on COUNTRY_ID will be used to access the rows of the
CUSTOMERS table. Note that although the index on COUNTRY_ID is used, a filter operation
must be performed on the CUST_CREDIT_LIMIT column to return the correct results.
Summary
This lesson introduced both rule-based and cost-based optimization. Rule-based optimization is
still supported for backward compatibility. You have seen how the RBO uses a ranking
scheme, and how you can influence its behavior.
Cost-based optimization was introduced in Oracle7. The optimizer uses statistics to calculate
the cost of access paths, taking into account the number of logical I/Os, CPU usage, and
network traffic.
You also learned how the optimizer approach can be set at the instance and session level,
supporting four different settings:
• CHOOSE Default behavior: CBO when statistics are available, RBO when not
• RULE Forces RBO behavior
• FIRST_ROWS Optimizes response time for the first result
• ALL_ROWS Optimizes overall throughput
Practice Overview
In this practice, you analyze execution plans using RBO. After changing your session to use the
CBO, you generate new execution plans and examine the results.
Note: By default, statistics exist on the tables in the SH schema.
Objectives
After completing this lesson, you should be able to do the following:
• Identify ROWIDs
• Identify index types
• Show the relationship between indexes and constraints
• Create indexes manually
• Identify index issues with foreign keys
• Identify basic access methods
• Skip scan of indexes
• Index usage monitoring
• Overview of clusters
Instructor Note (for page 6-3)
Index-organized tables have logical ROWIDs instead of real ROWIDs. This is discussed in the
“Alternative Storage Techniques” lesson.
Demo scripts: demo06_01.sql and demo06_02.sql
Purpose: To show the students ROWIDs.
Oracle9i: SQL Tuning Workshop 6-2
ROWIDs
ROWIDs
Oracle uses ROWIDs to store addresses of table rows, such as in indexes. The extended
ROWID format was introduced with the new partitioning capabilities of Oracle 8.0. A
restricted ROWID data type is still available for backward compatibility. Oracle still uses
restricted ROWIDs internally, when the extended format is not required.
Each Oracle table has a pseudocolumn named ROWID. ROWIDs are not stored in the
database. However, you can create tables that contain columns having the ROWID data type,
although Oracle does not guarantee that the values of such columns are valid ROWIDs.
The extended ROWID has a four-piece format, using a base-64 encoding technique. In the
example in the slide, following are the fields and their meaning:
AAABDL Data object number; identifies the database segment
AAC Relative file number; unique within the tablespace
AAAF0Z Data block number; relative to its data file
AAA-AAE Row numbers in the block
ROWID Manipulation
You can use the DBMS_ROWID package to extract information from an extended ROWID or to
convert a ROWID from extended format to restricted format (or vice versa). See the Oracle9i
Supplied Packages Reference for more information.
Oracle9i: SQL Tuning Workshop 6-3
Indexes
Indexes
An index is a database object that is logically and physically independent of the table data. The
Oracle9i Server may use an index to access data required by a SQL statement, or use indexes
to enforce integrity constraints. You can create and drop indexes at any time. The Oracle9i
Server automatically maintains indexes when the related data changes.
Index Types
Indexes can be unique or nonunique. Unique indexes guarantee that no two index entries have
the same value. A composite index (also called a concatenated index) is an index that you
create on multiple columns in a table (up to 32). Columns in a composite index can appear in
any order, and need not be adjacent in the table.
Index Storage Techniques
For standard indexes, Oracle uses B*-tree indexes that are balanced to equalize access times.
Bitmap indexes are discussed in the “Advanced Indexes” lesson.
Reverse key indexes reverse the bytes of the column values. They have benefits when used for
monotonically-increasing values that can lead to skewed indexes in which older values are
dropped over time. They will not be covered in this course.
Function-based indexes are also covered in the “Advanced Indexes” lesson.
Domain indexes are application-specific indexes. They are created, managed, and accessed by
routines supplied by an index type. They will not be covered in this course.
Oracle9i: SQL Tuning Workshop 6-4
B*-Tree Indexes
Index entry
Root
Branch
Branch
blocks
ALLEN MARTIN
FORD SMITH
Leaf
blocks
UNIQUE
CREATE INDEX index_name
BITMAP
,
ON table ( column )
expression
ASC
DESC
Index Statistics
In the “Collecting Statistics” lesson, statistics are discussed in detail. You can collect and
retrieve some useful index statistics with the following commands:
SQL> analyze index <index_name> validate structure;
SQL> select * from index_histogram;
SQL> select * from index_stats;
The first command populates the two views that are queried by the last two commands. The
first view gives information about the selectivity of the index (cardinality of key values), and
the second view can tell you the height of the B*-tree, the number of blocks, the number of
distinct key values, and the number of leaf block entries.
ALTER INDEX Statement
You can use the ALTER INDEX statement to change an index definition, or rebuild an existing
index. On the next page you see some guidelines about when this could be needed. See the
Oracle9i SQL Reference for the full ALTER INDEX statement syntax.
Instructor Note
Typically, a DBA uses these views to monitor and maintain statistics.
CUSTOMERS
# cust_id
Clusters: Overview
A cluster is a structure that is used to physically locate rows together because they share
common column values. There are two type of clusters: index clusters and hash clusters.
The columns used to cluster the rows are called the cluster key:
• The cluster key can consist of one or more columns.
• Tables in a cluster have columns that correspond to the cluster key. Clustering is a
mechanism that is transparent to the applications using the tables. Data in a clustered
table can be manipulated as though it were stored in a regular table.
• Updating one of the columns in the cluster key may cause the Oracle server to physically
relocate the row.
• The cluster key is independent of the primary key. The tables in a cluster can have a
primary key, which may be the cluster key or a different set of columns.
Clusters can waste disk space if the cluster key distribution is skewed. Choosing an appropriate
cluster size is one of the most important issues when you consider whether to cluster your
tables.
Cluster Example
If they are stored as regular tables, CUSTOMERS and COUNTRIES are placed in different
segments. This means that the tables use their own set of blocks—a block that stores rows from
the CUSTOMERS table does not contain data from the COUNTRIES table, and vice versa.
Because customers are usually accessed by COUNTRY_ID, the developer may cluster both
tables or just the CUSTOMERS table on the COUNTRY_ID column. If the tables CUSTOMERS
and COUNTRIES are stored in a cluster, they share the same cluster segment; see the example
in the slide. A block in this segment stores rows from both tables. This means that a full table
scan on the COUNTRIES table takes more time, because its rows are interspersed with the
rows from the CUSTOMERS table.
If a table is stored in a cluster, the cluster becomes the physical unit of storage, and the table is
a logical entity; that is, the clustering is transparent to the users and applications.
Instructor Note
These two tables are not ideal for clustering; the slide is meant only for illustration purposes.
Note that this is an index cluster.
Cluster
Index Clusters
An index cluster uses a special index, known as the cluster index, to maintain the data within
the cluster. In the example on the previous page, when a user inserts a new customer, the
cluster index ensures that the new customer is placed in the same block as the other customers
with the same country_id.
Choose Appropriate Columns for the Cluster Key
Choose cluster key columns carefully. If you use multiple columns in queries that join the
tables, make the cluster key a composite key. In general, the same column characteristics that
make a good index apply to cluster indexes.
A good cluster key has enough unique values so that the group of rows corresponding to each
key value fills approximately one or two data blocks. Too many rows per cluster key value can
require extra searching to find rows for that key. Cluster key values that are too general (such
as MALE and FEMALE) result in excessive searching and can result in poor performance.
Instructor Note
The “Choose Appropriate Columns for the Cluster Key” paragraph is also true for hash
clusters.
DML Performance
DML does not perform as well on a cluster as it does with a table. When inserting a row into a
cluster, it must be inserted into the same block with the other rows with the same cluster key. If
several rows with different cluster keys are being inserted, then several blocks must be read.
However, for a regular table, the rows can be inserted into the single block that is the first
block on the free list.
Cluster Key Column Updates
When the key value in a cluster is updated, the row may need to be moved physically to
another block. This can cause degradation in performance. Therefore, columns that are
frequently updated are not good candidates for being part of the cluster key.
Row Frequency
Clusters are generally more suitable in situations where the frequency of occurrence of key
values is uniform. In the COUNTRIES example, if the countries contain almost an equal
number of customers, then clustering may be useful. If a certain country has only one or two
customers and others have several hundred, clustering may not be justified. If a country has
many customers, then some of the customer rows may need to be stored in overflow blocks.
Direct Loads
Clusters cannot be loaded using direct loads.
Hash Clusters
A hash cluster uses a hash function to calculate the location of a row. The hash function uses
the cluster key and can either be system generated or defined by the developer.
When a row is inserted into a table in a hash cluster:
• The hash key columns are used to compute a hash value
• The row is stored based on the hash value
The hash function is also used to find the row while retrieving the data from a hashed table.
Example
Customer rows are stored in a hash cluster using the Customer ID as the cluster key. When
Customer 501 is inserted into the cluster, the hash function determines where to store the row
using the cluster key value 501. Later, when Customer 501 is needed for retrieval, the cluster
key value 501 is input into the hash function, which points to the location where the row was
inserted.
Hash Cluster Performance Characteristics
The hash function is used to locate the row while retrieving the data from a hashed table
without the I/O and CPU time required to search through an index. Because large tables have
indexes with more B*-tree levels, hash clusters perform better when there are many rows in the
table.
Oracle9i: SQL Tuning Workshop 6-25
Hash Clusters: Limitations
• Use clusters:
– With tables that are primarily queried or joined on the
cluster key
– To even cluster key distribution
• Clusters can also reduce performance:
– DML statements
– Full table scans
Summary
This lesson introduced you to indexes. An index is a database object that is logically and
physically independent of the table data. The Oracle server may use an index to access data
required by a SQL statement, or use indexes to enforce integrity constraints.
There are different index types. For standard indexes, Oracle uses B*-tree indexes that are
balanced to equalize access times. You can create an index manually by using the CREATE
INDEX syntax, or automatically by creating unique or primary key constraints on tables.
Indexes can impact performance. There are different methods to scan a table: full table scans,
index scans, and fast full index scans. Index scans can improve the performance of many SQL
statements.
Instructor Note
After completing this lesson, it is possible to do workshops 1 (single table, single predicate)
and 2 (sorting, grouping, and set operators). Note that the next lesson has a short practice
session. Thus, a typical approach might be to do workshop 1 now, continue with the next two
lessons, and then do workshop 2. But, as indicated, you can do workshop 2 earlier as well.
Objectives
• Use the ANALYZE command to provide the Oracle9i cost-based optimizer with statistics.
• Identify table, index, column, and cluster statistics. View these statistics in the Oracle8i
data dictionary views.
• Use the DBMS_STATS package to collect and manage optimizer statistics.
• Identify predicate selectivity calculations, assuming even data distribution. Identify the
consequences of using bind variables for predicate selectivity.
• Create histograms for columns with skewed data.
• Choose appropriate values for:
– A sample size (when using the ESTIMATE option of the ANALYZE command)
– The number of histogram buckets
INDEX
ANALYZE TABLE name
CLUSTER schema.
for
clause
COMPUTE STATISTICS
DELETE STATISTICS
ESTIMATE STATISTICS
for sample
clause clause
for
COMPUTE and ESTIMATE: clause
FOR TABLE
FOR ALL INDEXES
FOR ALL COLUMNS
INDEXED SIZE n
FOR COLUMNS column
SIZE n SIZE n
sample
ESTIMATE: clause
ROWS
SAMPLE n
PERCENT
• Number of rows
• Number of blocks (always exact)
• Number of empty blocks (always exact)
• Average available free space
• Number of chained or migrated rows
• Average row length
• Last ANALYZE date and sample size
• Data dictionary views:
– USER_TABLES
– ALL_TABLES
Table Statistics
Following are the table statistics collected by the ANALYZE command:
• Number of rows
• Number of blocks (always exact)
• Number of empty (never used) blocks (always exact)
• Average row length (in bytes)
• Average available free space (in bytes per block)
• Number of chained or migrated rows
• Last ANALYZE date and sample size
Note: Chaining occurs when a single row is too big to fit in any database block, usually because
column widths are too big, or there are too many columns defined for the table.
Migration happens when the space reserved in the block for updates is insufficient, which causes
the row being updated to be moved into another block, leaving a pointer in the original block to
the new address for the row.
Index Statistics
Note that the index B*-tree-level statistic is always exact, regardless whether you compute or
estimate index statistics. The number of distinct keys may include rows that have been deleted.
Index Clustering Factor
The index clustering factor is an important index statistic for the CBO to estimate index scan
costs. It is an indication of the number of (logical) data block visits needed to retrieve all table
rows via the index. If the index entries follow the table row order, this value approaches the
number of data blocks (each block is visited only once). Conversely, if the index entries
randomly point at different data blocks, the clustering factor could approach the number of rows.
Suppose a typical table contains 50 rows per data block and 5,000 rows in total. When a query
looks for 50 rows (selectivity is 1%), the clustering factor indicates whether you must visit one
block (best case: 1% of the data blocks) or 50 blocks (worst case: 50% of the data blocks).
Instructor Note
The index clustering factor is a measure of how well ordered the data is in the table in reference
to the key.
Column Statistics
These are the column statistics collected by the ANALYZE command:
• Number of distinct values
• The lowest value
• The highest value
• Last ANALYZE and sample size
Instructor Note
This statement causes histograms to be created for every column. This is not advisable.
SQL> analyze table products
2 compute statistics for all columns;
Instead, use the syntax FOR TABLE FOR COLUMNS, such as:
SQL> analyze table products compute statistics
2 for table for columns prod_list_price
3 size 50;
(This example is shown on page 7-22.)
Both the lowest and the highest value are stored in RAW (binary) format.
dbms_stats.GATHER_TABLE_STATS
(’SH’ -- schema
,’CUSTOMERS’ -- table
, NULL -- partition
, 20 -- sample size(%)
, FALSE -- block sample?
,’FOR ALL COLUMNS’ -- column spec
, 4 -- degree of //
,’DEFAULT’ -- granularity
, TRUE -- cascade to indexes
);
DBMS_STATS Procedures
Use the following procedures to gather statistics on indexes, tables, columns, and partitions:
Procedure Description
GATHER_ INDEX_STATS Collects index statistics
GATHER_TABLE_STATS Collects table, column, and index statistics
GATHER_SCHEMA_STATS Collects statistics for all objects in a schema
GATHER_DATABASE_STATS Collects statistics for all objects in a database
DBMS_STATS does not gather cluster statistics, but you can use it to gather statistics on the
individual tables in the cluster. Other options for gathering statistics with DBMS_STATS include
the following:
• Collect statistics either serially or in parallel. Index statistics are gathered only serially.
• Compute or estimate statistics; use block samples or row samples to estimate statistics.
• Specify the columns for which statistics are needed with GATHER_TABLE_STATS.
• Use the CASCADE option to collect index statistics when generating table statistics.
• Hold statistics in statistics tables to experiment with different sets of statistics.
Instructor Note
Run the demo07_05.sql script to show the above code. This demonstration script works with
scripts demo07_06.sql and demo07_07.sql, which are run on the next pages.
Oracle9i: SQL Tuning Workshop 7-13
Copy Statistics Between Databases
Data dictionary
User-defined
statistics table
Copy from
DD to user
Export
table Copy from
and import
user table user table
to DD
User-defined
statistics table
Data dictionary
dbms_stats.CREATE_STAT_TABLE
(’SH’ -- schema
,’STATS’ -- statistics table name
,’DATA01’ -- tablespace
);
dbms_stats.EXPORT_TABLE_STATS
(’SH’ -- schema
,’CUSTOMERS’ -- table name
, NULL -- no partitions
,’STATS’ -- statistics table name
, NULL -- id for statistics
, TRUE -- index statistics
);
Example
To copy statistics for a table from the data dictionary to a user-defined statistics table, use the
CREATE_STAT_TABLE procedure to create the user-defined table and
EXPORT_TABLE_STATS to copy the statistics. DBMS_STATS includes the following
procedures for copying statistics:
Procedure Description
CREATE_STAT_TABLE Creates a user-defined table capable of
holding statistics
DROP_STAT_TABLE Drops a user-defined statistics table
EXPORT_object_STATS Exports statistics from the data dictionary to a
user-defined table
IMPORT_object_STATS Imports statistics from a user-defined table to the
data dictionary
begin
dbms_stats.CREATE_STAT_TABLE
(’SH’, ’STATS’);
dbms_stats.GATHER_TABLE_STATS
(’SH’, ’CUSTOMERS’
,stattab => ’STATS’);
end;
begin
dbms_stats.DELETE_TABLE_STATS
(’SH’, ’CUSTOMERS’);
dbms_stats.IMPORT_TABLE_STATS
(’SH’, ’CUSTOMERS’
,stattab => ’STATS’);
end;
Predicate Selectivity
The expected percentage of rows (the selectivity) depends on the sort of operation performed in
the WHERE clause. The cost-based approach is more likely to choose an index access path for a
query with good selectivity. The Oracle9i optimizer uses the following rules when calculating
selectivity.
Equality Conditions
Unique or primary key = constant
This is a single-row predicate, returning one row maximally; it has maximum selectivity.
Nonunique index = constant
The query may return more than one row, so the optimizer uses the number of distinct key values
in the column. The selectivity is 1 divided by the number of distinct values.
Bounded and Unbounded Ranges
Bounded and unbounded ranges need statistics to calculate selectivity, thus CBO and RBO yield
different results. The RBO must rely on the syntax and its ranking scheme; the CBO uses the
following formula:
high - low + 1 low high
selectivity = ----------------
max - min + 1
min max
Oracle9i: SQL Tuning Workshop 7-17
Bind Variables and Predicate Selectivity
10
8 Width-balanced Height-balanced
5
3
2 1 1 3 100
2 1 1 2 10
1 1 1 2 8
1 1 1 1 5
1
1 1 1 1 3 100
1
1 Height-balanced histograms give a
1 better sense of the data distribution.
1
1 100
1 25 50 75 100
Histograms
The optimizer must estimate the number of rows processed by a given query. This is achieved
partly by estimating the selectivities of the query’s predicates. The accuracy of these estimates
depends on the optimizer’s knowledge of the data distribution. Without histograms, an even
distribution is assumed.
Types of Histograms
Width-balanced histograms partition the attribute domain into equal width ranges, called
buckets, and count the number of rows, the value of which in a column makes them fall into each
bucket. If several rows contain the same value, they are all put into the same bucket, increasing
the height of that bucket.
In height-balanced histograms, each bucket has approximately the same number of rows. If
several rows contain the same value, they may be put in the same bucket or spread across several
buckets, as the histogram balances the heights of the buckets. Some buckets cover only one or a
few values, because there are many rows with those values. Some buckets cover many values,
because there are few rows with those values.
Height-balanced histograms are more suitable for computing selectivity estimates and are used
by the Oracle9i Server.
Histogram Tips
For many tables, it is appropriate to use the FOR ALL INDEXED COLUMNS clause to collect
statistics on all indexed columns. Be aware, however, that histograms are also created on unique
and primary key constrained columns that are enforced through indexes, which might be a waste
of space.
If the data distribution is not static, the histogram must be frequently updated.
Because the optimizer does not take into consideration the current value of bind variables,
histograms are not helpful (and therefore not worth the expense) when bind variables are used.
Histograms allocate additional storage and should be used only when they can substantially
improve the query plans.
• Histogram information:
USER/ALL_HISTOGRAMS
SQL> select endpoint_number, endpoint_value
2 from dba_tab_histograms
3 where table_name = ’PRODUCTS’
4 and column_name = ’PROD_LIST_PRICE’;
Summary
This lesson introduced you to table statistics used with the cost-based optimizer. By using the
ANALYZE command or DBMS_STATS package, you can collect statistics on your tables which
are used by the cost-based optimizer to determine optimal execution routes for SQL statements
on analyzed tables.
Selectivity, the expected percentage of rows returned, depends on the type of operation
performed in the WHERE clause. The cost-based approach is more likely to choose an index
access path for a query with good selectivity. Selectivity is influenced by data distribution. When
bind variables are used in a SQL statement, the optimizer must use a built-in default selectivity.
These built-in values cannot be influenced.
Oracle can use a histogram to decide whether or not to use the index. A histogram stores
information about the frequency of various column values. Without histograms, an even
distribution is assumed. Create histograms for columns with skewed data.
Instructor Note (for page 7-28)
Demonstration script: demo07_09.sql. This shows the result of the previous demonstration
when you created histograms on the PRODUCTS table. You can also use the histdemo.sql
script, based on the example histogram in this lesson. That script does not show endpoint
numbers 0 and 1 (see the note above).
Practice Overview
In this practice, you generate table statistics for the PRODUCTS table. Because the
PROD_STATUS column of the PRODUCTS table is a good candidate for a histogram, you create
a histogram for that column. You calculate the selectivity of a certain predicate on the column,
both before and after creating the histogram. Then you query the data dictionary for the last
analyze date and sample size. Lastly, you delete the statistics on the PRODUCTS table from the
data dictionary.
Objectives
After completing this lesson, you should be able to:
• Influence the Oracle9i optimizer behavior at the following levels:
– Instance level, setting initialization parameters
– Session level, using the ALTER SESSION command
– Statement level, using hints
• Specify access path hints
• Specify hints on and in views
Note
• Influencing the optimizer at the instance and session level is also covered in the “Rule-
Based Optimization Versus Cost-Based Optimization” lesson.
• When statistics are not available and cost-based optimization is forced, the optimizer uses
default built-in values instead. This may produce suboptimal execution plans.
• OPTIMIZER_FEATURES_ENABLE
– Set to current release to enable the latest optimizer
features
– Defaults to current release of database
• OPTIMIZER_INDEX_COST_ADJ
SELECT
INSERT /*+ hint */
UPDATE comment
text
DELETE
SELECT
INSERT --+ hint
UPDATE comment
text
DELETE
Hint Recommendations
Use hints as a last remedy when tuning SQL statements. Be aware that the optimizer cannot
change execution plans, even when better alternatives become available.
Hints may become less valid (or even invalid) when the database structure or contents change.
Note: Beware that all hints, except the RULE hint, force cost-based optimization regardless of
your optimizer mode setting at the session or instance level. Hence statistics may be required to
ensure an optimal execution plan.
Hint Categories
Parameters to change the optimizer mode at the instance or session level (RULE, CHOOSE,
ALL_ROWS, FIRST_ROWS(n)) were discussed in the “Rule-Based Optimization Versus Cost-
Based Optimization” lesson. You can specify these four values as a hint at the statement level as
well.
Hints for Access Path Methods
This lesson concentrates on this hint category.
Hints for Parallel Execution
This topic is covered in a separate course.
Hints for Join Orders and Operations
Optimizing joins is covered in the “Sorting and Join Techniques” lesson.
Note: The NO_INDEX hint is useful if you use distributed query optimization. It applies to
function-based, B*-tree, bitmap, cluster, and domain indexes. If this hint does not specify an
index name, the optimizer will not consider a scan on any index on the table.
Oracle9i: SQL Tuning Workshop 8-11
Specifying Index Names
An index hint can optionally specify one or more index names:
• If this hint specifies a single available index, the optimizer performs a scan on this index. The
optimizer does not consider a full table scan or a scan on another index on the table.
• If this hint specifies a list of available indexes, the optimizer considers the cost of a scan on
each index in the list and then performs the index scan with the lowest cost. The optimizer
may also choose to scan multiple indexes from this list and merge the results, if such an access
path has the lowest cost. The optimizer does not consider a full table scan or a scan on an
index that is not listed in the hint.
• If this hint specifies no indexes, the optimizer considers the cost of a scan on each available
index on the table and then performs the index scan with the lowest cost. The optimizer might
also choose to scan multiple indexes and merge the results, if such an access path has the
lowest cost. The optimizer does not consider a full table scan.
Fast Full Index Scans
Fast full index scans are an alternative to full table scans when the index contains all the columns
that are needed for the query. It cannot be used to eliminate a sort operation. It reads the entire
index by using multiblock reads (as in a full table scan).
Note: During a full table scan Oracle can read multiple blocks into memory in a single I/O
operation. This is commonly referred to as multiblock reads. It is controlled by the
DB_FILE_MULTIBLOCK_READ_COUNT initialization parameter. This parameters values is OS
dependent.
Instructor Note (for the next page)
Demonstration script demo08_02.sql.
Purpose: This script first drops all indexes on the CUSTOMERS table, then creates an index on the
COUNTRY_ID column. Several SELECT statements are run, some including hints. After each
SELECT statement, the screen is paused for you to explain the results before moving to the next
statement.
Hint Description
Summary
This lesson introduced you to additional optimizer settings and hints.
You can set the optimizer mode to use either the cost-based method or rule-based method. The
supported optimizer mode values are CHOOSE, RULE, FIRST_ROWS(n), and ALL_ROWS.
By using hints, you can influence the CBO at the statement level. Use hints as a last remedy when
tuning SQL statements. There are several hint categories, one of which is hints for access path
methods.
To specify a hint, use the hint syntax within the SQL statement. Alternatively, you can use the
Oracle SQL Analyze graphical interface to specify a hint within SQL statements.
Be careful when using hints with views. Hints inside views or on views are handled differently
depending on whether or not the view is mergeable into the top-level query.
Instructor Note
After this lesson, students can complete Workshop 2 (sorting, grouping, and set functions). Note
that some theory about sorting is discussed in the beginning of the next lesson.
Practice Overview
In this practice, you investigate RBO behavior for a query on the PROMOTIONS table, which is a
small table with an index. This is accomplished by deleting table statistics and setting the
optimizer mode to CHOOSE. Then you use a hint to force a full table scan on the table. Compare
the results and decide the best approach for this query.
Objectives
After completing this lesson, you should be able to:
• Optimize sort performance by using Top-N SQL
• Describe the techniques used for executing join statements
• Explain the optimization performed on joins
• Find the optimal execution plan for a join
Joins are probably the most frequent cause of performance problems.
Instructor Note
For more details about hash joins, see Oracle9i Performance and Tuning Guide.
Instructor Note (for page 9-4)
Demonstration script: demo09_01.sql
Instructor Note (for page 9-5)
Demonstration script: Run the demo09_02.sql script to demonstrate the code shown.
SQL> SELECT *
2 FROM (SELECT prod_id
3 , prod_name
4 , prod_list_price
5 , prod_min_price
6 FROM products
7 ORDER BY prod_list_price DESC)
8 WHERE ROWNUM <= 5;
Top-N SQL
The idea behind the Top-N SQL feature is that you do not need to sort a full set if, for example,
you are only interested in the four highest (or lowest) values.
If a set is too big to be sorted in memory, the performance is significantly and especially
degrading. This is caused by the I/O to and from temporary storage of intermediate results on
disk.
If you only want to know the four highest values, you only need an array with four slots to scan
the set and keep the four highest values in that array.
The WHERE clause of the statement above is merged into the in-line view to prevent the full
PRODUCTS table from being sorted by PROD_LIST_PRICE. This appears in the execution
plan as follows:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=248 Card=5 Bytes=660000)
1 0 COUNT (STOPKEY)
2 1 VIEW (Cost=248 Card=10000 Bytes=660000)
3 2 SORT (ORDER BY STOPKEY) (Cost=248 Card=10000 Bytes=370000)
4 3 TABLE ACCESS (FULL) OF ’PRODUCTS’ (Cost=109 Card=10000
Bytes=370000)
• Join statement
• Join predicate, nonjoin predicate
• Single-row predicate
SQL> SELECT c.cust_last_name, c.cust_first_name,
2 co.country_id, co.country_name
3 FROM customers c , countries co
4 WHERE c.country_id = co.country_id
5 AND co.country_id = ’JP’
6 OR c.cust_id = 205;
Join Terminology
A join statement is a select statement with more than one table in the FROM clause.
However, the optimizer uses the techniques usually associated with joins when executing other
kinds of statements where more than one table is referenced, such as in subqueries. Subqueries
are discussed later in this lesson. This is an example of a SQL:1999 compliant join statement
which is available in Oracle9i:
SQL> SELECT c.cust_last_name, c.cust_first_name,
2 co.country_id, co.country_name
3 FROM customers c JOIN countries co
4 ON (c.country_id = co.country_id) –- join predicate
5 AND co.country_id = 'JP' –- nonjoin predicate
6 OR c.cust_id = 205; -- single-row predicate
A join predicate is a predicate in the WHERE clause that combines the columns of two of the
tables in the join.
A nonjoin predicate is a predicate in the WHERE clause that references only one table.
A single-row predicate is an equality predicate on a column with a unique or primary key
constraint, or a column with a unique index without corresponding constraint. The optimizer
knows that these predicates always return one row or no rows at all.
• Equijoin
SQL> SELECT c.cust_last_name, co.country_name
2 FROM customers c, countries co
3 WHERE c.country_id = co.country_id;
• Nonequijoin
SQL> SELECT c.cust_last_name, c.cust_credit_limit
2 , cr.credit_limit_id
3 FROM customers c, credit_limits cr
4 WHERE c.cust_credit_limit
5 BETWEEN cr.credit_limit_min
6 AND cr.credit_limit_max;
Join Operations
A join operation combines the output from two row sources and returns one resulting row
source. Row sources were introduced in the “EXPLAIN and AUTOTRACE” lesson.
Join Operation Types
The Oracle9i Server supports different join operation types:
• Nested loops join
• Sort/merge join
• Hash join
• Cluster join
• Full outer Join
Instructor Note
Oracle9i also supports index joins as a new access path. The INDEX_JOIN hint instructs the
optimizer to use that path. This hint has an effect if a sufficiently small number of indexes exist
that (together) contain all columns required to resolve the query, so that access to that table can
be avoided. So it is not a different join technique, but rather a smart way to avoid table access
by using multiple indexes.
Nested loops
2 3
Sort/Merge Joins
In the sort operations, the two row sources are sorted on the values of the columns used in the
join predicate. If a row source has already been sorted in a previous operation, the sort merge
operation skips the sort on that row source.
Sorting could make this join technique expensive, especially if sorting cannot be performed in
memory.
The merge operation combines the two sorted row sources to retrieve every pair of rows that
contain matching values for the columns used in the join predicate.
Basic Execution Plan
MERGE (JOIN)
SORT (JOIN)
TABLE ACCESS (…) OF tableA
SORT (JOIN)
TABLE ACCESS (…) OF tableB
Merge 1
Sort 2 3 Sort
4 5
Table access Table access
Hash Joins
The Oracle9i Server considers hash joins only when you use the cost-based optimizer. Hash
joins, like sort/merge joins, can be performed only for equijoins. These are the steps performed
for a hash join:
• The Oracle server performs a full table scan on both tables and splits each into as many
partitions as required, based on the available memory.
• The Oracle server builds a hash table on the smallest partition and uses the other partition
to probe the hash table. All partition pairs that do not fit into memory are placed onto disk.
The number of partitions into which the tables are split depends on the amount of available
memory.
Basic Execution Plan
HASH JOIN
TABLE ACCESS (..) OF tableA
TABLE ACCESS (..) OF tableB
Performance Considerations
As a general rule, hash joins outperform sort/merge joins. In some cases, a sort/merge join can
outperform a nested loops join; it is even more likely that a hash join will.
Hash join
2 3
Outer Joins
An outer join is a join where the join predicates have a plus (+) sign added to signify that the
join returns a row even if the table with this special predicate (the outer joined table) does not
have a row that satisfies the join condition.
Outer Join Example
SQL> select s.time_id, t.time_id
2 from sales s, times t
3 where s.time_id (+) = t.time_id -- join predicate with +
4 and s.promo_id (+) = 19 -- non-join predicate with +
5 and s.quantity_sold (+) > 45 -- non-join predicate with +
6 /
Note: Predicates without a plus (+) sign on the table that is outer joined will disable the outer
join functionality.
Instructor Note
Demonstration script: Run the demo09_04.sql script to demonstrate the code shown. The
first example shows the outer join disabled, the second join shows the outer join enabled.
Outer Joins
In Oracle9i SQL:1999 compliant join syntax has been introduced. So now you can specify
RIGHT, LEFT, and FULL OUTER joins to create the outer joins instead of using the (+) sign.
This is not intended to improve performance. The usage of (+) is still supported.
Note: This statement is the equivalent to the statement in the previous slide.
Outer Joins
A full outer join acts like a combination of the left and right outer joins. In addition to the
inner join, rows from both tables that have not been returned in the result of the inner join are
preserved and extended with nulls. In other words, with full outer joins you can join tables
together, yet still show rows which do not have corresponding rows in joined-to tables.
For both the cost-based and rule-based approaches, the optimizer first determines whether
joining two or more of the tables definitely results in a row source containing at most one
row. The optimizer recognizes such situations based on UNIQUE and PRIMARY KEY
constraints on the tables. If such a situation exists, then the optimizer places these tables first
in the join order. The optimizer then optimizes the join of the remaining set of tables.
The CBO, generates a set of execution plans based on the possible join orders, join methods,
and available access paths. The optimizer then estimates the cost of each plan and chooses the
one with the lowest cost. The optimizer estimates costs in these ways:
The cost of a nested loops operation is based on the cost of reading each selected row of the
outer table and each of its matching rows of the inner table into memory. The optimizer
estimates these costs using the statistics in the data dictionary. The cost of a sort merge join is
based largely on the cost of reading all the sources into memory and sorting them. The CBO’s
choice of join orders can be overridden with the ORDERED hint. If the ORDERED hint
specifies a join order that violates the rule for an outer join, then the optimizer ignores the hint
and chooses the order. Also, you can override the optimizer’s choice of a join method with
hints.
Oracle9i: SQL Tuning Workshop 9-17
Execution of Outer Joins
Rule 1
A single-row predicate forces its row source to be
placed first in the join order.
Rule 2
For outer joins, the table with the outer join operator
(+) must come after the other table in the join order
for processing the join.
2 2
3 3! = 3 * 2 = 6
4 4! = 4 * 3 * 2 = 24
5 5! = 5 * 4 * 3 * 2 = 120
... ...
Because only the execution plans that follow a join predicate are considered, the number of
possible execution plans usually does not reach this value when the number of tables is greater
than 3. However, joining 20 tables still results in many possible execution plans. Remember
that all tables involved in a join, including those that are hidden in views, count as separate
tables.
Instructor Note
The exclamation mark used on the slide is the factorial operator from mathematics.
PRODUCTS CUSTOMERS
SALES
Star Joins
One type of data warehouse design centers on what is known as a star schema, which is
characterized by one or more very large fact tables that contain the primary information in the
data warehouse and a number of much smaller dimension tables (or lookup tables), each of
which contains information about the entries for a particular attribute in the fact table.
A star join is a join between a fact table and a number of lookup tables. Each lookup table is
joined to the fact table by a primary-key to foreign-key join, but the lookup tables are not joined
to each other. The fact table normally has a concatenated index on the key columns to facilitate
this type of join.
A star join is executed using the normal join operations, but using a join order that does not
correspond to the join predicates: the smaller dimension tables are joined together first (like a
Cartesian product). The resulting row source is then used for a concatenated index access into
the fact table using a indexed nested loops join.
Note: The star transformation, which is discussed in the “Advanced Indexes” lesson, should
not be confused with the star joins. Whereas star joins work well for schemas with a small
number of dimensions and dense fact tables, star transformation works well for schemas with a
large number of dimensions and sparse fact tables.
STAR_TRANSFORMATION
The hint in the slide is not really a join hint, but rather a hint to optimize data access for the
facts table of a star schema. Refer to the “Advanced Indexes” lesson for a further description of
this hint.
DRIVING_SITE
The DRIVING_SITE hint forces query execution to be done at a different site than that
selected by the Oracle9i Server. This hint can be used with either rule-based or cost-based
optimization.
Example
SQL> SELECT /*+ DRIVING_SITE(co) */
2 c.cust_last_name
3 , c.cust_first_name
4 , co.country_name
5 FROM customers c
6 , countries@rsite co
7 WHERE c.country_id = co.country_id;
Instructor Note
Check whether your students are familiar with the countries@rsite notation for remote
tables; rsite is a database link reference.
Oracle9i: SQL Tuning Workshop 9-30
Subqueries and Joins
Noncorrelated Subqueries
A noncorrelated subquery does not contain any references to the outer (main) query and can be
executed independently. For example:
SQL> SELECT c.*
2 FROM customers c
3 WHERE c. country_id IN
4 (SELECT co.country_id
5 FROM countries co
6 WHERE co.country_subregion = ’Asia’);
This statement is executed as a hash join with the subquery as the outer table. It retrieves all
customers in location Asia. This is the execution plan structure:
Execution Plan
---------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=363 Card=6250
Bytes= 918750)
1 0 HASH JOIN (Cost=363 Card=6250 Bytes=918750)
2 1 TABLE ACCESS (FULL) OF ’COUNTRIES’ (Cost=1 Card=2
Bytes=28)
3 1 TABLE ACCESS (FULL) OF ’CUSTOMERS’ (Cost=360
Card=50000 Bytes=6650000)
Throwaway of Rows
Compare the number of rows from the two input row sources with the number of rows from the
join operation. If both input row sources have more rows than the join operation, you have
identified a throwaway of rows.
Examples
Rows Operation
100 NESTED LOOPS
45 TABLE ACCESS (…) OF our_outer_table
4530 TABLE ACCESS (…) OF our_inner_table
In this example, approximately 98% of the retrieved rows (4,430 out of 4,530 rows from the
inner table) are thrown away, so there should be room for performance improvement.
Rows Operation
100 MERGE (JOIN)
23 SORT (JOIN)
23 TABLE ACCESS (…) OF table_A
150 SORT (JOIN)
150 TABLE ACCESS (…) OF table_B
In this example, only a small fraction of the retrieved rows are thrown away, so you cannot
expect significant performance improvements by reducing throwaway. TKPROF is required to
determine the throwaway rows because AUTOTRACE does not show this .
Minimize Processing
Changing nested loops joins into sort/merge joins can remove index access but might add
significant sorting overhead.
Hash joins require less processing than the same sort/merge operation; hash joins are
theoretically the fastest algorithm to join tables. However, they need appropriate tuning.
Cluster joins need less I/O than the corresponding nested loops join because the child rows that
belong to one parent row are stored together in the same logical cluster block.
Summary
This lesson introduced you to Top-N SQL, performance issues with sorting, and how to
optimize join statements.
There are several different join operations that Oracle9i supports:
• Nested loops join
• Sort/merge join
• Hash join
• Cluster join
• Full outer Join
The Oracle9i optimizer optimizes joins in three phases. First, the optimizer generates a list of
all possible join orders. Then, for each possible join order, the best join operation is selected.
Finally, the Oracle9i optimizer determines the best access path for each row source of the
execution plan. RBO differs from CBO. Specifying a hint forces the CBO to be used.
Throwaway of rows occurs when both input row sources have more rows than the result of the
join operation. Minimize throwaway of rows and improve performance.
Instructor Note
Workshops 3 (Joins) and 4 (Subqueries) can be done at this point, if not completed already.
Oracle9i: SQL Tuning Workshop 9-37
Oracle9i: SQL Tuning Workshop 9-38
Optimizer Plan Stability
Objectives
After completing this lesson, you should be able to:
• Identify the purpose and benefits of optimizer plan stability
• Create stored outlines
• Use stored outlines
• Edit stored outlines
• Maintain stored outlines
USER_OUTLINE_HINTS
NAME Name of the outline
NODE ID of the query or subquery to which the hint applies (The top-level query is
labeled 1; subqueries start with 2.)
JOIN_POS Position of the table in the join order (The value is 0 for all hints except access
method hints, which identify a table to which the hint and the join position apply.)
HINT Text of the hint
In Same
shared outline Execute
y y
pool? category? outline plan
n n
Query DD for
matching outline
Procedure Description
DROP_UNUSED Drops outlines that have not been used since they were
created
DROP_BY_CAT Drops outlines assigned to the specified category name
UPDATE_BY_CAT Reassigns outlines from one category to another
You can alter individual outlines by using the ALTER OUTLINE command and drop them
by using the DROP OUTLINE command.
You can export and import plans by exporting the schema OUTLN, where all outlines are
stored. Outlines can be queried from tables in the OUTLN schema:
• OL$ contains the outline name, category, create timestamp, and the text of the
statement.
• OL$HINTS contains the hints for the outlines in OL$.
Also, there are equivalent data dictionary views: DBA_OUTLINES and
DBA_OUTLINE_HINTS.
Note: Because the user OUTLN is automatically created at database creation with password
OUTLN, this password should be changed for security reasons.
Oracle9i: SQL Tuning Workshop 10-9
Maintaining Stored Outlines
SQL> begin
2 outln_pkg.DROP_UNUSED;
3 outln_pkg.UPDATE_BY_CAT
4 (’DEFAULT’,’TRAIN’);
5 outln_pkg.DROP_BY_CAT(’TRAIN’);
6 end;
• Join order
• Join methods
• Access methods
• Distributed execution plans
• Distribution methods for parallel query execution
• Query rewrite
• View and subquery merging
Editable Attributes
Join order: Join order defines the sequence in which tables are joined during query
execution. This includes tables produced by evaluating subqueries and views as well as
tables appearing in the FROM clauses of subqueries and views.
Join methods: Join methods define the methods used to join tables during query execution.
Examples are nested loops join and sort/merge join.
Access methods: Access methods define the methods used to retrieve table data from the
database. Examples are indexed access and full table scan.
Distributed execution plans: Distributed queries have execution plans that are generated for
each site at which some portion of the query is executed. The execution plan for the local site
at which the query is submitted can be controlled by plan stability and equivalent plans must
be produced at that site. In addition, driving site selection can be controlled centrally even
though it might normally change when certain schema changes are made.
Distribution methods: For parallel query execution, distribution methods define how the
inputs to execution nodes are partitioned.
View and subquery merging and summary rewrite: View and subquery merging and
summary rewrite is meant to include all transformations in which objects or operations that
occur in one subquery of the original SQL statement are caused to migrate to a different
subquery for execution. Summary rewrite can also cause one set of objects or operations to
be replaced by another.
Oracle9i: SQL Tuning Workshop 10-13
Outline Cloning
• Public outlines:
– Default setting when creating outlines
– Stored in the OUTLN schema
– Used when USE_STORED_OUTLINES is set to TRUE or a
category
• Private outlines:
– Stored in the user’s schema for the duration of the
session
– Can be edited
– Used when USE_PRIVATE_OUTLINES is set to TRUE or
a category
– Changes can be saved as public outlines
Outline Cloning
Public Outlines
In Oracle9i, all outlines are public objects indirectly available to all users on the system for
whom the USE_STORED_OUTLINES configuration parameter setting applies. Outline data
resides in the OUTLN schema that can be thought of as an extension to the SYS schema, in
the sense that it is maintained by the system only. You are discouraged from manipulating
this data directly to avoid security and integrity issues associated with outline data. Outlines
will continue to be public by default and only public outlines are generally available to the
user community.
Private Outlines
In Oracle9i, the notion of a private outline to aid in outline editing is introduced. A private
outline is an outline seen only in the current session and whose data resides in the current
parsing schema. By storing the outline data for a private outline directly in the user’s schema,
users are given the opportunity to manipulate the outline data directly through DML in
whichever way they choose. Any changes made to such an outline are not seen by any other
session on the system and applying a private outline to the compilation of a statement can be
done only in the current session through a new session parameter. Only when a user
explicitly chooses to save edits back to the public area do the rest of the users see them.
An outline clone is a private outline that has been created by copying data from an existing
outline.
Oracle9i: SQL Tuning Workshop 10-14
Outline: Administration and Security
• V$SQL
• OUTLINE_SID column added
• Identifies the session ID from which the outline was
retrieved
Configuration Parameters
The USE_PRIVATE_OUTLINES session parameter is added to control the use of private
outlines instead of public outlines. When an outlined SQL command is issued, this parameter
causes outline retrieval to come from the session private area rather than the public area that
is usually consulted as per the setting of USE_STORED_OUTLINES. If no outline exists in
the session private area, no outline is used for the compilation of the command.
You can specify a value for this session parameter by using the following syntax:
ALTER SESSION SET USE_PRIVATE_OUTLINES =
TRUE | FALSE | category_name ;
Where :
• TRUE enables use of private outlines and defaults to the DEFAULT category
• FALSE disables use of private outlines
• category_name enables use of private outlines in the named category
When a user begins an outline editing session, the parameter should be set to the category to
which the outline being edited belongs. This enables the feedback mechanism in that it
allows the private outline to be applied to the compilation process.
Upon completion of outline editing, this parameter should be set to FALSE to restore the
session to normal outline lookup as dictated through the USE_STORED_OUTLINES
parameter.
dbms_outln_edit.refresh_private_outline(
’private_outline1’)
Summary
This lesson introduced you to the use of stored outlines and its influence on optimizer plan
stability, and use of the package OUTLN_PKG on stored outlines.
Oracle9i provides you with a means of stabilizing execution plans across Oracle releases,
database changes, or other factors that can cause an execution plan to change. You can create
a stored outline containing a set of hints used by the optimizer to create an execution plan.
Stored outlines rely partially on hints that the optimizer uses to achieve stable execution
plans. These plans are maintained through many types of database and instance changes. You
can use stored outlines to ensure that all your customers access the same execution plans.
You can create outlines for a session, or you can create them for specific SQL statements.
To use stored outlines, the USE_STORED_OUTLINES option must be set to TRUE or a
category name. Use procedures in the OUTLN_PKG package to manage stored outlines and
their categories. There are data dictionary views that store information on the outlines.
Note: For more details about plan stability and using stored outlines, refer to Oracle9i
Tuning, “Optimizer Modes, Plan Stability, and Hints.”
Practice Overview
In this practice, you create, use, and then remove a stored outline. You must alter your
session in order for the outline to be used. Use the USER_OUTLINES and
USER_OUTLINE_HINTS data dictionary views to view the status of your outline.
Objectives
After completing this lesson, you should be able to:
• Create bitmapped indexes, identify bitmapped index operations, and use bitmapped index
hints.
• Describe, identify and use star transformations.
• Create and use function-based indexes.
• View data in the data dictionary on indexes.
Bitmapped Indexes
Bitmap indexing provides both substantial performance improvements and space savings over
usual (B*-tree) indexes when there are few distinct values in the indexed columns.
If the number of distinct keys in a column is less than or equal to 1% of the total number of
rows for a table, then the column has low cardinality. For example, in a table with 10,000 rows,
if there are fewer than 100 different values in a particular column, then that column has low
cardinality, and a bitmap index may provide the best performance for queries based on that
column.
The maximum number of columns in a single bitmapped index is 30.
The Oracle9i Server compresses bitmap storage, making bitmapped indexes very
storageefficient. The bitmaps are stored in a B*-tree structure internally to maximize access
performance.
Note: Rule-based optimization does not consider using bitmapped indexes.
1 Row #2
1 Row #3
0
1
1
0
Option Description
LOCAL Must be a locally partitioned bitmap index
LOGGING|NOLOGGING Determines whether the creation of the index will be
logged (LOGGING) or not logged (NOLOGGING) in
the redo log file
COMPUTE STATISTICS Collects statistics at relatively little cost during the
creation of an index
Supplier
ID
SQL> SELECT * 3
2 FROM products
3 WHERE supplier_id = 3; 0
0
Rows 1
returned 0
0
1
.
.
A B A and B A or B not A
1 1 1 1 0
0 1 0 1 1
1 0 0 1 0
0 0 0 0 1
0 1 0 1 1
start end
key bitmap
ROWID ROWID
10.8000.3 <Rognes, 1.2.3, 10.8000.3, 1000100100010010100…>
<Aix-en-Provence, 1.2.3, 10.8000.3, 0001010000100100000…>
<Marseille, 1.2.3, 10.8000.3, 0100000011000001001…>
……………………………………
SELECT SUM(s.amount_sold) Only the index and the
FROM sales s, customers c SALES table are used to
WHERE s.cust_id = evaluate the query. No
c.cust_id join with the CUSTOMERS
AND c.cust_city = ’Marseille’; table is needed.
• Advantages:
– Good performance for join queries and space efficient
– Is especially useful for large dimension tables in star
schemas
• Disadvantages:
– More indexes are required: Up to one index per
dimension table column rather than one index per
dimension table.
– Maintenance costs are higher: Building or refreshing a
bitmap join index requires a join.
Index Hints
Most of the following index hints are covered in the “Influencing the Optimizer” lesson:
INDEX(t [idx]...) Chooses an (ascending) index scan for the specified table
INDEX_ASC(t [idx]...) Chooses an ascending index scan for the specified table
INDEX_DESC(t [idx]...) Chooses a descending index scan for the specified table
AND_EQUAL Chooses an execution plan that uses an access path,
(t idx idx [idx]…) merging the scans on several single-column
indexes (minimum 2, maximum 5)
INDEX_COMBINE(t [idx]...) Uses a Boolean combination of bitmapped indexes
INDEX_FFS(t [idx]...) Causes a fast full index scan to be performed
NO_INDEX (t [idx]…) Explicitly disallows a set of indexes for the specified table
Execution Plan
--------------------------------------------------------
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=189
Card=240 Bytes=18480)
1 0 TABLE ACCESS (BY INDEX ROWID) OF
’COPY_SALES’ (Cost=189 Card=240 Bytes=18480)
2 1 BITMAP CONVERSION (TO ROWIDS)
3 2 BITMAP OR
4 3 BITMAP MINUS
5 4 BITMAP INDEX (SINGLE VALUE) OF
’CS_PROMO_ID_BIX’
6 4 BITMAP INDEX (SINGLE VALUE) OF
’CS_CUST_ID_BIX’
7 3 BITMAP MERGE
8 7 BITMAP INDEX (RANGE SCAN) OF
’CS_PROD_ID_BIX’
Dimension
tables PRODUCTS CUSTOMERS
SALES
Facts
table
Star Transformation
One type of data warehouse design centers around what is known as a star schema, which is
characterized by one or more large fact tables that contain the primary information in the data
warehouse and a number of much smaller dimension tables (or lookup tables), each of which
contains information about the entries for a particular attribute in the fact table.
A star query is a join between a fact table and a number of lookup tables. Each lookup table is
joined to the fact table using a primary-key to foreign-key join, but the lookup tables are not
joined to each other.
Star queries are quite challenging to tune. There are two basic approaches to this problem:
• Do as smart a job as possible. The “Sorting and Joining” lesson discussed how to use star
joins to optimize star queries.
• Transform the statement into an easier one. This lesson discusses this approach, using
bitmapped indexes. This approach is also known as star transformation.
SQL> SELECT *
2 FROM customers
3 WHERE upper(cust_last_name) = ’SMITH’;
Function-Based Indexes
You can use Oracle9i to create indexes based on column expressions (virtual columns). Function-
based indexes provide an efficient mechanism for evaluating statements that contain expressions
in their WHERE clauses. For example, you can use function-based indexes to create case-
insensitive indexes, as the example in the slide shows.
You can create a function-based index to materialize computational-intensive expressions in the
index, so that the Oracle server does not need to compute the value of the expression when
processing SQL statements.
Function-based index expressions can use any function that is deterministic; that is, the returned
value must not change for a given set of input values. PL/SQL functions used in defining
function-based indexes must be DETERMINISTIC. The index owner needs the EXECUTE
privilege on the defining function. If the EXECUTE privilege is revoked, then the function-based
index is marked DISABLED. A function-based index can be created as a bitmap index.
Enabling Function-Based Indexes
Use the following command to enable function-based indexes:
SQL> alter session set query_rewrite_enabled = true;
Query rewrites are discussed in more detail in the “Materialized Views and Temporary Tables”
lesson.
Instructor Note
The Oracle server does not need to compute the value of the expression when processing SELECT
and DELETE statements. For INSERT and UPDATE statements the Oracle server must evaluate
the expression to update the index. Demonstration script: demo11_03.sql.
Oracle9i: SQL Tuning Workshop 11-24
Function-Based Indexes: Usage
COLUMN_EXPRESSION
-------------------------------------------------
UPPER("CUST_LAST_NAME")
Instructor Note
Demonstration script: demo11_04.sql
Summary
This lesson introduced you to bitmapped indexes, star transformations, and function-based
indexes.
• Bitmap indexing provides both substantial performance improvements and space savings
over usual (B*-tree) indexes when there are few distinct values in the indexed columns.
• The star transformation is a cost-based query transformation aimed at executing star
queries efficiently. Star transformations are used in data warehousing. One type of data
warehouse design centers around what is known as a star schema, which is characterized
by one or more large fact tables that contain the primary information in the data
warehouse and a number of much smaller dimension tables (or lookup tables), each of
which contains information about the entries for a particular attribute in the fact table.
• With function-based indexes you can create indexes based on column expressions (virtual
columns). Function-based indexes provide an efficient mechanism for evaluating
statements that contain expressions in their WHERE clauses, or for supporting linguistic
sorting.
You can use the data dictionary views to retrieve information about your indexes.
Objectives
After completing this lesson, you should be able to:
• Identify the purpose and benefits of materialized views
• Create materialized views
• Enable query rewrites
• Create dimensions
• Identify the benefits of temporary tables
A materialized view:
• Is an “instantiation” of a SQL statement
• Has its own data segment and offers:
– Space management options
– Use of its own indexes
• Is useful for:
– Expensive and complex joins
– Summary and aggregate data
Materialized Views
A materialized view stores both the definition of a view plus the rows resulting from the
execution of the view. Like a view, it uses a query as the basis, but the query is executed at the
time the view is created, and the results are stored in a table. You can define the table with the
same storage parameters as any other table and place it in the tablespace of your choice. You
can also index the materialized view table like other tables to improve the performance of
queries executed against them.
When a query can be satisfied with data in a materialized view, the Oracle9i Server transforms
the query to reference the view rather than the base tables. By using a materialized view,
expensive operations such as joins and aggregations do not need to be re-executed.
Instructor Note
Materialized views (MVs) are replacing read-only snapshots. However, MVs provide greater
functionality, such as the ability to allow query rewrites directly or through dimensions, as
covered in this lesson. The SNAPSHOT keyword is supported in place of MATERIALIZED
VIEW for backward compatibility.
There were several issues and restrictions with MVs and query rewrites in 8.1.5. For example,
as soon as you have a non-join predicate in the MV definition, only exact matches will result in
a query rewrite. Make sure to test any MV demonstration, and don’t improvise.
• Refresh types:
– COMPLETE
– FAST
– FORCE
– NEVER
• Create materialized view logs for FAST refreshes
SQL> CREATE MATERIALIZED VIEW LOG ON …
• Manual refresh
– By using the DBMS_MVIEW package
• Automatic refresh
– Synchronous: Upon commit of changes made to the
underlying tables—but independent of the committing
transaction
– Asynchronous: Define a refresh interval
for the materialized view
Query Rewrites
Because accessing a materialized view may be significantly faster than accessing the
underlying base tables, the optimizer rewrites a query to access the view when the query allows
it. The query rewrite activity is transparent to applications. In this respect, their use is similar to
the use of an index.
Users do not need explicit privileges on materialized views to use them. Queries executed by
any user with privileges on the underlying tables can be rewritten to access the materialized
view.
A materialized view can be enabled or disabled. A materialized view that is enabled is
available for query rewrites.
AS SELECT … FROM …
OPERATION NAME
------------------------ --------------------
SELECT STATEMENT
TABLE ACCESS FULL fweek_pscat_costs_mv
Dimensions
Dimensions are data dictionary structures that define hierarchies based on columns in existing
database tables. Although they are optional, they are recommended because they:
• Enable additional rewrite possibilities without the use of constraints (Implementation of
constraints may not be desirable in a data warehouse for performance reasons).
• Help document dimensions and hierarchies explicitly.
• Can be used by OLAP tools.
A dimension is an object stored in the data dictionary that enables a query to leverage the
results of an existing summary to produce another summary much cheaper. For example if the
original query had 1700 I/Os, using the materialized view would take about 170 I/Os and the
query on using a dimension would take approximately 17 I/Os.
Instructor Note
If you are running out of time, you can say that dimensions enable additional query rewrite
capabilities, specifically targeted at data warehouse environments; then you can refer them to
the excellent chapters in the Oracle9i Data Warehousing Guide manual for more details. You
can then continue to discuss temporary tables, the last topic of this lesson.
All
Fiscal week
Day
- CALENDAR_YEAR - YEAR
- CALENDAR_QUARTER_DESC - QUARTER
- CALENDAR_MONTH_DESC - MONTH
- TIME_ID - DAY
Attributes
- DAY_NAME
- CALENDAR_MONTH_NAME
- DAYS_IN_CAL_QUARTER
All
Hierarchy Hierarchy
FIS_ROLLUP Fiscal year Year CAL_ROLLUP
Fiscal week
Day
Temporary Tables
With the Oracle8i Server and later, you can create temporary tables. Temporary tables can
improve performance significantly by holding temporary data for reuse within your transaction
or session.
A temporary table has the following properties:
• Temporary table data is visible only within its defined scope; the scope can be defined to
be a session or a transaction.
• The definition of a global temporary table is visible to all sessions. In contrast, the
definition of a local temporary table does not persist at the end of the session that creates
it.
• Temporary table data is stored within the sort space used by the session. If sort space is
not sufficient to accommodate the data, space is allocated in the user’s temporary
tablespace.
• Indexes on temporary tables have the same scope and duration as the table they
correspond to.
• Triggers and views can be defined on temporary tables. However, a view cannot be
defined joining a temporary and a permanent table.
• The CREATE GLOBAL TEMPORARY TABLE AS SELECT command can be used to
create a temporary table and insert data into it.
• Definitions of temporary tables can be exported and imported.
View created.
Table created.
TABLE_NAME T DURATION
------------------------------ - ------------
SALES_DETAIL_TEMP Y SYS$SESSION
Summary
This lesson introduced you to materialized views, query rewrites, dimensions and hierarchies,
and temporary tables:
• A materialized view stores both the definition of a view plus the rows resulting from the
execution of the view. Like a view, it uses a query as the basis, but the query is executed
at the time the view is created and the results are stored in a table.
• Because accessing a materialized view can be significantly faster than accessing the
underlying base tables, the optimizer rewrites a query to access the view when the query
allows it. The query rewrite activity is transparent to applications.
• Dimensions are data dictionary structures that define hierarchies based on columns in
existing database tables. Dimensions describe business entities such as products,
departments, and time in a hierarchical, categorized manner. A dimension can consist of
one or more hierarchies. Each hierarchy comprises multiple levels.
• The Oracle9i Server enables you to create temporary tables. Temporary tables can
improve performance significantly by holding temporary data for reuse within your
transaction or session.
Objectives
After completing this lesson, you should be able to identify and describe the benefits of:
• Index-organized tables
• External tables
Regular
table
External
table OS file
Index-organized
table
ROWID
Non-key columns
Key column
Row header
IOT Limitations
• Index-organized tables must have a primary key. This is the unique identifier and is used as
the basis for ordering; there is no ROWID to act as a unique identifier in the B*-tree
structure.
• Index-organized tables cannot be part of an index cluster or a hash cluster.
• Index-organized tables cannot include LONG columns, but they can contain LOB columns.
Because index-organized tables are B*-tree structures, they are subject to fragmentation as a
result of incremental updating. You can use the ALTER TABLE … MOVE command to rebuild
the index-organized table:
ALTER TABLE iot_tablename MOVE [OVERFLOW...];
Specifying the optional OVERFLOW clause causes the overflow segment to be rebuilt as well.
Overflow segments are explained on the following pages.
Instructor Note
Oracle8i removed the IOT restrictions regarding secondary indexes and additional UNIQUE
constraints. You can mention the absence of real ROWIDs as an IOT limitation, although the
implementation of logical ROWIDs (with the UROWID data type) resolved all related
shortcomings.
Note that the MOVE option for the ALTER TABLE statement was new in Oracle8i.
Oracle9i: SQL Tuning Workshop 13-6
When to Use Index-Organized Tables
Block Block
USER_TABLES USER_INDEXES
TABLE_NAME TABLE_NAME
IOT_TYPE INDEX_NAME
IOT_NAME INDEX_TYPE
TABLESPACE_NAME PCT_THRESHOLD
INCLUDE_COLUMN
SELECT *
FROM ex_table;
External
table OS file
USER_EXTERNAL_TABLES USER_EXTERNAL_LOCATIONS
TABLE_NAME TABLE_NAME
TYPE_OWNER LOCATION
TYPE_NAME DIRECTORY_OWNER
DEFAULT_DIRECTORY_OWNER DIRECTORY_NAME
DEFAULT_DIRECTORY_NAME
REJECT_LIMIT
ACCESS_TYPE
ACCESS_PARAMETERS
Summary
This lesson introduced you to alternative storage techniques including index-organized tables,
index clusters and hash clusters.
• An indexed-organized table (IOT) is like a regular table with an index on one or more of its
columns, but instead of maintaining two separate segments for the table and the B*-tree
index, the Oracle server maintains one single B*-tree structure which contains the primary
key value and other (non-key) column values for the row.
• An external table is a data source that is located outside of the Oracle database. After the
external table is set up using the CREATE TABLE ... ORGANIZATION EXTERNAL
syntax, you can query this data source using a SELECT statement.
Objectives
Data warehousing applications often have requirements to extract data from multiple
sources, transform it to different formats, transport it to different platforms, and load it into
database tables. The Oracle9i database has a number of new features that streamline these
processes.
For more information, see Oracle9i Data Warehousing Guide and Oracle9i SQL Reference.
Statistics
-----------------------------------------------------
4 recursive calls
22 db block gets
720 consistent gets
20 physical reads
520 redo size
2250 bytes sent via SQL*Net to client
634 bytes received via SQL*Net from client
3 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
25 rows processed
Instructor Note
Demonstration script: demo14_01.sql and demo14_02.sql
Purpose: These scripts run the statements shown on pages 5 and 6 to demonstrate the WITH
clause statistics.
Merge Syntax
INTO Clause
Use the INTO clause to specify the target table you are updating or inserting into.
USING Clause
Use the USING clause to specify the source of the data to be updated or inserted. The source
can be a table, view, or the result of a subquery.
ON Clause
Use the ON clause to specify the condition upon which the MERGE operation either updates
or inserts. For each row in the target table for which the search condition is true, Oracle
updates the row based with corresponding data from the source table. If the condition is not
true for any rows,
Oracle inserts into the target table based on the corresponding source table row.
WHEN MATCHED | NOT MATCHED
Use these clauses to instruct Oracle how to respond to the results of the
join_condition. You can specify these two clauses in either order.
Summary
In this lesson, you should have learned about some techniques used in data warehousing that
help to transform, transport, and load data.