Physical Database Design Using Oracle
Physical Database Design Using Oracle
AUERBACH PUBLICATIONS
www.auerbach-publications.com To Order Call: 1-800-272-7737 Fax: 1-800-374-3401 E-mail: orders@crcpress.com
AUERBACH PUBLICATIONS
A CRC Press Company Boca Raton London New York Washington, D.C.
This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials or for the consequences of their use. Neither this book nor any part may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, microlming, and recording, or by any information storage or retrieval system, without prior permission in writing from the publisher. The consent of CRC Press LLC does not extend to copying for general distribution, for promotion, for creating new works, or for resale. Specic permission must be obtained in writing from CRC Press LLC for such copying. Direct all inquiries to CRC Press LLC, 2000 N.W. Corporate Blvd., Boca Raton, Florida 33431. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identication and explanation, without intent to infringe.
DEDICATION
This book is dedicated to Jean Lavender, a real survivor and feisty lady, whose courage has been an inspiration.
CONTENTS
1 Introduction to Oracle Physical Design
..................................... 1
Preface...................................................................................................................................... 1 Relational Databases and Physical Design ........................................................................ 1 Systems Development and Physical Design ..................................................................... 2 Systems Analysis and Physical Database Design ............................................................. 4 The Structured Specication.................................................................................... 4 The Role of Functional Decomposition in Physical Database Design....................... 6 Introduction to Logical Database Design ......................................................................... 7 Unnormalized Form .................................................................................................. 9 Nested Tables ....................................................................................................................... 10 First Normal Form .................................................................................................. 11 Second Normal Form ............................................................................................. 12 Third Normal Form ................................................................................................ 13 E/R Modeling ...................................................................................................................... 13 Bridging between Logical and Physical Models ............................................................. 15 Activities of Oracle Physical Design .................................................................... 17 Physical Design Requirements Validation ....................................................................... 18 How to Identify a Poor Requirements Evaluation ............................................ 19 Functional Validation............................................................................................... 19 How to Spot a Poor Functional Analysis............................................................ 20 Evaluating the Worth of an Existing System ..................................................... 20 Locating Oracle Physical Design Flaws........................................................................... 23
............................................. 25
Introduction .......................................................................................................................... 25 Data Relationships and Physical Design ......................................................................... 25 Redundancy and Physical Design.......................................................................... 26 The Dangers of Overnormalization..................................................................... 28 Denormalizing One-to-Many Data Relationships.............................................. 29 Denormalizing Many-to-Many Data Relationships............................................ 32 Recursive Data Relationships ................................................................................. 34
vii
viii
...........................................................
55
Introduction .......................................................................................................................... 55 Planning the Server Environment .................................................................................... 57 Design for Oracle Server CPU.............................................................................. 57 Designing Task Load Balancing Mechanisms ......................................... 58 Design for Oracle Server RAM............................................................................. 59 Making Oracle Memory Nonswappable................................................... 60 Design for the Oracle Server Swap Disk ............................................................ 60 Designing the Network Infrastructure for Oracle......................................................... 62 Oracle Network Design...................................................................................................... 63 The tcp.nodelay parameter ............................................................................. 63 The automatic_ipc parameter ........................................................................ 64 The break_poll_skip parameter ................................................................... 64 The disable_oob parameter ............................................................................. 65 The SDU and TDU parameters............................................................................ 65 The queuesize Parameter in listener.ora.................................. 67 Connection Pooling and Network Performance ................................................ 67 ODBC and Network Performance ....................................................................... 68 Oracle Replication Design ...................................................................................... 69 Oracle Disk Design ............................................................................................................. 70 Conclusion............................................................................................................................. 70
73
Introduction .......................................................................................................................... 73 Reserving RAM for Database Connections ........................................................ 74 RAM Used by Oracle Connections........................................................... 75 Determining the Optimal PGA Size .................................................................... 76 A Script for Computing Total PGA RAM.......................................................... 77 SGA Parameter Components ............................................................................................ 79 Designing the Shared Pool................................................................................................. 80 Library Cache Usage Measurement ...................................................................... 81 Oracle Event Waits.............................................................................................................. 82 The Shared Pool Advisory Utility..................................................................................... 82 Designing the Data Buffers ............................................................................................... 87 Using v$db_cache_advice .............................................................................. 91
Contents
ix
Design with the DBHR...................................................................................................... 92 Using Statspack for the DBHR ............................................................................. 94 Data Buffer Monitoring with Statspack ................................................... 94 Pinning Packages in the SGA............................................................................................ 97 Automatic Repinning of Packages........................................................................ 99 Designing Logon Triggers to Track User Activity ...................................................... 101 Designing a User Audit Table.............................................................................. 101 User Table Normalization .................................................................................... 102 Designing a Logon Trigger .................................................................................. 102 Designing the Logoff Trigger ............................................................................. 104 User Activity Reports ............................................................................................ 106 User Logon Detail Reports .................................................................................. 107 Designing Oracle Failover Options................................................................................ 109 Conclusion........................................................................................................................... 110
135
Introduction ........................................................................................................................ 135 Table Replication Design.................................................................................................. 135 Is the Transfer of Data Time Sensitive? ........................................................... 136 Is the Number of Tables Manageable?.............................................................. 136 Do All Your Replicated Tables Need to Be Updatable? ................................ 136 Does Your Database Change Constantly?......................................................... 136 Is the Number of Transactions Manageable?................................................... 137 Are You Replicating between Different Versions of Oracle or Different OSs?................................................................................................... 137 Do Both Sites Require the Ability to Update the Same Tables? .................. 137 Does the Replicated Site Require the Ability to Replicate to Another Site?...................................................................................................... 137
197
Introduction ........................................................................................................................ 197 Index Design Basics .......................................................................................................... 198 The Oracle B-Tree Index...................................................................................... 198 Bitmapped Indexes................................................................................................. 199
Contents
xi
Function-Based Indexes........................................................................................ 200 Index-Organized Tables ........................................................................................ 201 Evaluating Oracle Index Access Methods .................................................................... 202 Index Range Scan................................................................................................... 202 Fast Full-Index Scan .............................................................................................. 203 Designing High-Speed Index Access ............................................................................. 208 Speed Factors .......................................................................................................... 208 Parallel Option ............................................................................................ 208 Nologging Option .................................................................................. 209 Space and Structure Factors................................................................................. 209 Compress Option ....................................................................................... 209 Tablespace Block Size Option.................................................................. 210 Designing Indexes to Reduce Disk I/O............................................................ 211 Oracle Optimizer and Index Design.............................................................................. 213 Physical Row-Ordering and Index Design.................................................................... 215 Constraints and Index Design ............................................................................. 216 Using Multicolumn Indexes............................................................................................. 218 How Oracle Chooses Indexes......................................................................................... 219 Index Design for Star Schemas....................................................................................... 221 Indexing Alternatives to B-Tree Indexes....................................................................... 223 Bitmap Indexes ....................................................................................................... 223 Function-Based Indexes........................................................................................ 223 Reverse-Key Indexes and SQL Performance.................................................... 224 Index Usage for Queries with IN Conditions.................................................. 224 Design for Oracle Full-Index Scans............................................................................... 226 Oracle and Multiblock Reads............................................................................... 227 Basics of FBIs.................................................................................................................... 228 Indexing on a Column with NULL Values .................................................................... 229 Invoking the Full-Index Scan with a FBI ..................................................................... 230 An Important Oracle Enhancement................................................................... 231 How to Use Oracle9i Bitmap Join Indexes ................................................................. 231 How Bitmap Join Indexes Work..................................................................................... 231 Bitmap Join Indexes in Action ............................................................................ 233 Exclusions for Bitmap Join Indexes ................................................................... 234 Design for Automatic Histogram Creation .................................................................. 235 The method_opt=SKEWONLY dbms_stats Option .......................... 235 Conclusion........................................................................................................................... 236
Index ..................................................................................................
237
PREFACE
The evolution of the Oracle database has led to a revolution of design practices. As of Oracle Database 10g, the database physical structures have become more complex than ever before and database designers face a plethora of physical ways to implement the logical models. The purpose of this book is to correlate the logical data model with the physical implementation structures provided by Oracle Corporation. Oracle Database 10g offers object-oriented data structures, pure relational data structures as well as specialized data structures such as index-organized tables. Given so many choices, Oracle designers must understand the appropriate use of each physical technology and how it maps to their logical data models. This book targets the practicing Oracle professional who already has exposure to basic Oracle database administration. It is my hope that this text provides you with the insights that you need to choose the appropriate physical model for your mission-critical application. Regards,
Donald K. Burleson
xiii
xv
xvi
Oracle Privacy Security Auditing Includes Federal Law Compliance with HIPAA, Sarbanes-Oxley & The Gramm-Leach-Bliley Act GLB, Rampant TechPress, 2003 Oracle Index Management Secrets Top Oracle Experts Discuss Index Management Techniques, Rampant TechPress, 2003 Oracle SQL Internals Handbook, Rampant TechPress, 2003 Oracle Space Management Handbook, Rampant TechPress, 2003 Advanced SQL Database Programmer Handbook, Rampant TechPress, 2003 The Data Warehouse eBusiness DBA Handbook, Rampant TechPress, 2003 Oracle9iAS Administration Handbook, Oracle Press, 2003 Creating a Self-Tuning Oracle Database Automatic Oracle9i Dynamic SGA Performance, Rampant TechPress, 2003 Conducting the Oracle Job Interview IT Manager Guide for Oracle Job Interviews with Oracle Interview Questions, Rampant TechPress, 2003 Oracle9i UNIX Administration Handbook, Oracle Press, 2002 Oracle9i High Performance Tuning with STATSPACK, Oracle Press, 2002 Oracle Internals Tips, Tricks, and Techniques for DBAs, CRC Press, 2001 Oracle High Performance SQL Tuning, Oracle Press, 2001 Oracle High Performance Tuning with STATSPACK, Oracle Press, 2001 UNIX for Oracle DBAs Pocket Reference, OReilly & Associates, 2000 Oracle SAP Administration, OReilly & Associates, 1999 Inside the Database Object Model, CRC Press, 1998 High Performance Oracle Data Warehousing All You Need to Master Professional Database Development Using Oracle, Coriolis Publishing, 1997 High Performance Oracle8 Tuning Performance and Tuning Techniques for Getting the Most from Your Oracle8 Database, Coriolis Publishing, 1997 High Performance Oracle Database Applications Performance and Tuning Techniques for Getting the Most from Your Oracle Database, Coriolis Publishing, 1996 Oracle Databases on the Web Learn to Create Web Pages that Interface with Database Engines, Coriolis Publishing, 1996 Managing Distributed Databases Building Bridges between Database Islands, John Wiley & Sons, 1995 Practical Application of Object-Oriented Techniques to Relational Databases, John Wiley & Sons, 1994
1
INTRODUCTION TO ORACLE PHYSICAL DESIGN
PREFACE
Over the past 30 years, weve seen the evolution of a wide variety of systems analysis and design methodologies. Weve seen the methodologies of Grady Booch, Ed Yourdon, Chris Gane and Trish Sarson, as well as the emergence of standard systems development methodologies such as joint application development and Unied Modeling Language (UML). Regardless of the methodology, at some point in the systems implementation, the database designer must be able to convert a logical data modeling for data into physical data structures. From a database point of view, it is incidental whether youre dealing with a commercial database management system (DBMS), such as MySQL or Oracle, or whether youre writing your own DBMS in a language such as C or C ++. The point is that we must be able to take the logical data models and convert them into physical implementations that will minimize disk input/output (I/O) and provide the fastest possible throughput. We need to be able to implement the DBMS in such fashion that performance will be fast while preserving the logical data structures. This book is dedicated to the premise that the database designer should be able to take logical data models and convert them into a series of data structures that allow for fast and easy, logical access to the data.
Simplicity the concept of tables with r ows and columns is extremely simple and easy to understand. End users have a simple data model. Complex network diagrams used with the hierarchical and network databases are not used with a relational database. Data independence data independence is the ability to modify data structures (in this case, tables) without affecting existing programs. Much of this is because tables are not hard-linked to one another. Columns can be added to tables, tables can be added to the database, and new data relationships can be added with little or no restructuring of the tables. A relational database provides a much higher degree of data independence than do hierarchical and network databases. Declarative data access the Structured Query Language (SQL) users specify what data they want, then the embedded SQL (a procedural language) determines how to get the data. In relational database access, the user tells the system the conditions for the retrieval of data. The system then gets the data that meets the selection conditions in the SQL statements. The database navigation is hidden from the end user or programmer, unlike a Conference on Data Systems Languages (CODASYL) Data Manipulation Language (DML), where the programmer had to know the details of the access path. The most important point about SQL is that it provided programmers and end users with a simple, easy way to add, change, and extract data from a relational database. Any two tables could be joined together on the y at runtime using their primary or foreign keys. There are no pointers or hard links from one table to another.
User Requirements 1. Analysis E/R Model Data Dictionary Data Flow Diagrams Process Logic Specifications Finished System 2. Design Input/Output Design Data Dictionary 3. Implementation
2. Systems analysis systems analysis is a logical description of the data sources for the warehouse, data extraction analysis, data cleansing analysis, and data loading analysis. Unlike a traditional system, the warehouse analysis is heavily data-centric and not concerned with dening the system interfaces. 3. Logical design the systems design phase is the physical implementation of the logical data model that was developed in the systems analysis phase. This includes the design of the warehouse, specications for data extraction tools, data loading processes, and warehouse access methods. In this phase, a working pr ototype should be created for the end user. 4. Physical design the system design phase is also where the logical documentation is transformed into a physical structure. For database design, this involves the creation of the entity/relation (E/R) model and the determination of appropriate data storage techniques and index usage. This phase is wher e a thorough understanding of Oracle database architecture will pay off. 5. Implementation the implementation phase is the phase in which the warehouse is constructed and the software is written and tested. As shown in Figure 1.1, the implementation phase normally consumes as much effort as all of the other steps combined. Regardless of the reasons, it remains true that the implementation phase is by far the most time-consuming phase in the creation of any system.
If a development team has done a good job of analyzing, designing, and coding a new system, you might suspect that the programming team would disband immediately after coding is completed. But, this is seldom the case. The cost curve continues to gr ow after a system has been delivered: this can be attributed to the dynamic nature of systems requirements. Almost by denition, most long-term development efforts will deliver an obsolete system to their end users. The end users often lament, You gave me the system that I needed two years ago when you began the project! Many requirements have changed, even while you wer e creating the system. This is a common complaint and its not surprising to see that the programming staff immediately begins addressing the maintenance requests that have been stacking up while they were initially creating the system. A traditional computer system will continually become more and more expensive to maintain, until the cumulative costs exceed the benets of the system. A goal of a savvy systems manager is to foresee this dilemma and to start rewriting the system so that a new system is ready to replace the aging system when the costs become too cumbersome.
credit file Customer rejected order credit history invoice 1. cust-info fill order invoice A/R
Backorder request
Purchasing
inventory
Data dictionary the data dictionary is a description of all of the logical data items, including all data ows and data stores (les). The data dictionary describes how all of the data items are stored and how they have been transformed by the processes. The data dictionarys le specications also become the foundation for the relational tables that will comprise the Oracle warehouse. Process logic specications (structured specications) these specications describe all functional primitive processes. A process is dened as an operation that modies a data ow. The tools used to describe processes include pseudocode, procedure owcharts, decision trees, and decision tables. In a traditional systems analysis, the DFD does not stand by itself. Rather, the DFD is augmented by a data dictionary that describes all of the data ows and les and a set of process logic specications that describes how each process transforms data ows. A process logic specication (sometimes called a minispec) can be expr essed as structured English, decision trees, or any of the many other techniques used to describe how data ows are being changed. In traditional systems analysis, data dictionary denitions for all data items are normalized or grouped into database entities, which become E/R models in the database design phase. Eventually, the E/R models become relational tables during physical design. The identication and grouping of data items constitutes the entities that will establish the basic E/R model for the database engine.
new level
Purchasing inventory
A good rule of thumb for database analysis is that a DFD should be decomposed to the level wher e each process corresponds to a SQL operation. This allows the use of triggers within a relational database and greatly simplies the data physical database design. As you are probably beginning to see, the level of partitioning is critical for a successful database systems analysis. While this level of decomposing is ne for traditional systems analysis, it is better to continue to decompose the behavior for effective relational database design. There is still a great deal of controversy about the best way to approach database analysis for database systems. Architecturally, some theoreticians state that the relational model is better suited for use in an online transaction processing (OLTP) environment and multidimensional architectures are better suited to data warehouses. To address these data storage issues, Oracle has implemented physical constructs that are specic to data warehouse and object-oriented systems: Oracle data warehouse physical constructs: Table value partitions Table range partitions Table hash partitioning Oracle9i database multicolumn partitioning Index partitioning Oracle object-oriented physical features: Nested tables Object tables Abstract data types (ADTs) Table member methods Support for inheritance and polymorphism Developers must remember that the main difference between traditional systems analysis and database analysis is the focus on the data sources and the data usage.
As the decades progressed, disks became cheaper and the rules for normalization changed dramatically. Cheaper disks made r edundancy acceptable and the deliberate introduction of redundancy is now a common physical design activity. All of logical database modeling techniques have a common goal in mind. Regardless of the methodology, we see the following goals of database modeling: Manage redundancy to control data redundancy was far more important during the 1970s and 1980s when disk devices work stream was expensive. Today in the 21st century, disk devices are far cheaper than they have been before, but that does not mean that we can throw away the principles of data nor malization entirely. Rather, the Oracle database designer must take the logical database and introduce redundancy into the data model based on the cost of the redundancy. The cost of the redundancy or a function of the size of the redundant data item, and the frequency that the redundancy to items updated. To correctly model data relationships the proper modeling of data relationships is an important part of physical database design. Within any relational database, we nd a wealth of choices that are available to us. For example, modeling a super type sub-type relationship can be done in about ve different ways in an Oracle database. Each of these methods will allow proper storage of the data item, but with a radically different internal performance and ease of maintenance. To see how logical modeling interfaces with physical design, lets look deeper into the logical database design. For database systems, a systems developer begins by taking raw, denormalized relations from a systems analysis data dictionary. Then the developer takes the r elations to 3NF and looks at the introduction of redundancy for improved performance. Of course, data redundancy becomes even more important for an Oracle war ehouse developer than for a traditional OLTP designer, so we will carefully explore the options of table denormalization. We will also design a method for storing the precalculated data summaries that were dened in our systems analysis. Finally, we cannot always predict all the possible combinations of data attributes that will compose aggregate tables, so we must design a method for allowing our end users to dynamically dene aggregation criteria and store the aggregate values into Oracle tables. The process of normalization was originally intended to be a method for decomposing data structures into their smallest components. The process begins with the original data structures which are called unnormalized
relations and progresses through 1NF to 3NF. At this stage, the data structures are completely free of redundancy and are at their most decomposed level. To fully appreciate the process, lets take a look at the successive process of normalization.
Unnormalized Form
Essentially, an unnormalized relation is a relation that contains repeating values. An unnormalized relation can also contain relations nested within other relations, as well as all kinds of transitive dependencies. Sometimes unnormalized relations are signied by 0NF, but an unnormalized relation is not to be confused with a denor malized relation. The unnormalized relation is any relation in its raw state and commonly contains repeating values and other characteristics that are not found in denormalized relations. The process of denormalization is a deliberate attempt to introduce controlled redundant items into an already normalized form. Today, only a handful of DBMSs support repeating values, including Oracle, UniSQL DBMS, and some other databases. The relational database model requires that each column within a table contains atomic values; and until Oracle8, there was no facility for indexing multiple occurrences of a data item within a table. In any case, relations with repeating groups are supported by Oracle and the database designer must decide when to normalize the repeating groups into new relations or use Oracle constructs to leave the repeating group inside the entity. Oracle pr ovides two constructs for allowing repeating groups within tables, the varying-array (VARRAY) table and the nested table. VARRAY tables (Figure 1.4) have the benet of avoiding costly SQL joins and they can maintain the order of the VARRAY items based upon the sequence when they were stored. However, the longer row length of VARRAY tables causes full-table scans (FTSs) to run longer and the items inside the VARRAY cannot be indexed. More importantly, VARRAYs cannot be used when the number of repeating items is unknown or very large.
Student_name
Student_address (1)
VARRAY
10
Customer table
pointercolumn
PL/SQL VARRAYs
Order table
NESTED TABLES
Using the Oracle nested table structure, subordinate data items can be directly linked to the base table by using Oracles newest construct object ID (OID). One of the remarkable extensions of the Oracle database is the ability to reference Oracle objects directly by using pointers as opposed to relational table joins (Figure 1.5). Proponents of the object-oriented database approach often criticize standard relational databases because of the requirement to reassemble an object every time its referenced. They make statements such as, It doesnt make sense to dismantle your car every time you are done driving it and rebuild the car each time you want to drive it. Oracle has moved toward allowing complex objects to have a concrete existence. To support the concrete existence of complex objects, Oracle introduced the ability to build arrays of pointers with r ow references directly to Oracle tables. Just as a C ++ program can use the char** data structure to have a pointer to an array of pointers, Oracle allows similar constructs whereby the components of the complex objects reside in real tables with pointers to the subordinate objects. At runtime, Oracle simply needs to dereference the pointers and the complex object can be quickly rebuilt from its component pieces. Also, notice that sometimes repeating groups are derived from the sum of other values in the transaction relation. In those cases, we must make a conscious decision whether to redundantly store these summations or have Oracle compute them at runtime. Some shops even use nested tables as an adjunct to the 1NF representation of their data (Figure 1.6). Here we see that the studentgrade r elationship is accessible both with a
11
Student
Grade
standard relational join into the grade table or by access via the nested table.
12
Star Query E/R Model Customer dimension Cust_nbr Cust_name Salesperson dimension Salesperson_name Item dimension Item_nbr Item_name inventory amount List_price
FACT TABLE Order_year Order_quarter Order_month Order_nbr Salesperson_name Customer_name Customer_city Customer_state Customer_region Item_nbr quantity sold Total_cost
City dimension
Year dimension
State dimension
Quarter dimension
Region dimension
Month dimension
13
E/R MODELING
If we have followed the process for normalization through 3NF, we will be able to derive an E/R model that is essentially fr ee of redundant information. As a review, the E/R model was rst introduced by Professor Emeritus Peter Chen from the University of Louisiana and it is sometimes
14
Publisher Database
PUBLISHER PUB_KEY PUB_NAME PUB_STREET PUB_CITY PUB_STATE PUB_COUNTRY VARCHAR2 (9), NUMBER (5), VARCHAR2 (4), VARCHAR2 (20), VARCHAR2 (1), VARCHAR2 (30), DATE, NUMBER (6) VARCHAR2 (4), VARCHAR2 (40), VARCHAR2 (20), VARCHAR2 (20), VARCHAR2 (2), VARCHAR2 (30)
BOOK VARCHAR2 (6), VARCHAR2 (4), VARCHAR2 (80), VARCHAR2 (12), NUMBER (5,2), NUMBER (2), NUMBER (10), VARCHAR2 (200), DATE
pub_key pub_key
BOOK_AUTHOR
job_key
JOB JOB_KEY JOB_NAME JOB_MIN_SAL JOB_MAX_SAL NUMBER (5), VARCHAR2 (50), NUMBER (5), NUMBER (5)
book_key SALES
VARCHAR2 (11), VARCHAR2 (6), NUMBER (2) STORE_KEY BOOK_KEY ORDER_NUMBER ORDER_DATE QUANTITY STORE STORE_KEY STORE_NAME STORE_ADDRESS STORE_CITY STORE_STATE STORE_ZIP
book_key
VARCHAR2 (4), VARCHAR2 (6), VARCHAR2 (20), DATE, NUMBER (5)
author_key
AUTHOR VARCHAR2 (11) VARCHAR2 (40) VARCHAR2 (20) VARCHAR2 (12) VARCHAR2 (40) VARCHAR2 (20) VARCHAR2 (2) VARCHAR2 (5)
store_key
VARCHAR2 (4), VARCHAR2 (40), VARCHAR2 (40), VARCHAR2 (20), VARCHAR2 (2), VARCHAR2 (5)
called a Chen diagram. In the 25 years since the intr oduction of this model, many permutations have been created, but the basic principles of E/R modeling remain intact. While an E/R model may be fr ee of redundant information, it is impossible to implement the model in a r elational database without introducing redundancy to support the data relationships. For example, if a data model was implemented using a pointer-based DBMS, such as IMS, pointers would be used to establish relationships between entities. For relational databases, data columns must be copied as foreign keys to establish data relationships, thereby introducing redundancy (Figure 1.9). Hence, the physical designer must implement tools to maintain the basic nature of the E/R model, while denor malizing the data structures for high-performance. Oracle offers several denormalization tools and techniques, namely materialized views and manual preaggregation. This implies that we may have several relational models within the physical schema. The new Oracle9i data model extensions provide the following physical capabilities:
15
Prejoining tables this is achieved by deliberately introducing redundancy into the data model. Queries that required complex and time-consuming table joins can now be retrieved in a single disk I/O operation. Modeling real-world objects it is no longer a requirement for the relational database designer to model complex objects in their most atomic components and rebuild them at runtime. Using Oracles object-oriented constructs, real-world objects can have a concrete existence. Oracle can use arrays of pointers to represent these complex objects. Coupling of data and behavior one of the important constructs of object orientation is the tight coupling of object behaviors with the objects themselves. In Oracle, member methods can be created upon the Oracle object. All processes that manipulate the object are encapsulated inside Oracles data dictionary. This functionality has huge benets for the development of all Oracle systems. Prior to the introduction of member methods, each Oracle developer was essentially a custom craftsman writing custom SQL to access Oracle information. By using member methods, all interfaces to the Oracle database are performed using pretested methods with known interfaces. Thus, the Oracle developers role changes from custom craftsman to more of an assembly-line coder. You simply choose from a list of prewritten member methods to access Oracle information. These physical constructs allow us to cr eate many levels of data aggregation and enjoy the benets of a 3NF and a denormalized data structure (Figure 1.10).
16
Aggregate
Aggregate
Drill-Down
Operational Database
Instance the physical design of the database instance is critical to the database. In Oracle, the instance region is composed of over a dozen background processes and a random-access memory (RAM) region called the System Global Area (SGA). Table structures the Oracle database offers a host of different table structures, including standard relational tables, the array tables, (non-1NF tables), nested tables, index-organized tables (IOT), as well as table storage for the objects such as video and other images. Column structures the Oracle database allows for a variety of different ways to dene columns within the table, most notably the object-oriented concept of ADTs (user-dened data types). ADTs allow the nesting of all data types with other data types to more accurately model the real world. Indexing structures an important part of physical database design is building indexes and other access structures with the sole intent of improving the performance of the database system. Within
17
Oracle, these index structures can take a variety of forms including standard B-tree indexes, bitmapped indexes, function- based indexes (FBIs), and partitioned indexes. Its up to the database designer to create the appropriate indexes on the Oracle tables to ensure the fastest performance for all SQL that runs against these tables. Again, the primary goal of physical database design is to translate the logical data model into a suitable physical database model. Next, we will move on and look at how specic data relationships are transformed from a logical model into a physical Oracle database design.
18
19
that it is almost trivial. In a dynamic analysis project, screens, reports, and Oracle tables can be created in minutes using wizards, Hypertext Markup Language (HTML) generators, Oracle Enterprise Manager, etc. Even a complex system with 500 screens can be prototyped by a single analyst in just a few weeks. Once weve conrmed that the system components (input screens, database objects, output screens, and reports) contain the proper data elements and mappings, were ready for the true validation of the system the functional validation.
Functional Validation
The most important part of any systems design is evaluating the functionality of the existing system. A functional analysis of the existing system gives the designer insights into the three major areas of functional evaluation, inputs, processes, and outputs. All systems, regardless of size, complexity, or implementation, have the following functional characteristics: Input functional validation end-user interfaces via online screens or batch interfaces (File Transfer Protocol, document submission) are validated to ensure that they accept the correct datatypes and conform to all required check constraints (as dened in the Oracle dba_constraints view).
20
Processing functional validation this step examines how the inputs are transformed and stored in the database. This is the most critical area of functional evaluation because the data transformations are the core of the system. Without meaningful processes, the system is nothing more than an empty shell of prototype screens. Output functional validation all systems provide output. The output may be in the form of EUI display (both online and batch) or stored in the database. To properly conduct a functional evaluation, we must carefully look under the covers. The analyst must hand-execute all of the code within the system to ensure that the internal data transformations match the requirements. Remember, only the functional evaluation can tell you anything about the functional quality of the existing system. Relying solely on the requirements validation is insanity, because a system has no use without process logic to transform the data. A legitimate functional validation must include a complete description of all process logic and validation data. In plain English, the existing system must be exercised to ensure that expected inputs produce the proper outputs.
21
Technology factors the type of technology has a huge inuence on the costs. A system developed in an older technology (i.e., DC Common Business-Oriented Language [COBOL] system with CICS BMS screens) may cost ten times more than a state-of-the-art Java system with reusable components and HTML screens that are generated quickly with a HTML generator software (FrontPage , Cold Fusion , Dreamweaver , etc). Productivity factors it has long been recognized that IT develops have a nonlinear relationship between productivity and costs. On the low end, a $30-an-hour beginner has a cost/productivity factor far less than the $200-an-hour guru. This is because IT developers with more than a decade of full-time development experience know exactly how to solve a problem. Further, experienced IT developers usually have a personal library of reusable code that performs common business functions. These libraries allow the experienced developer to be far cheaper than the less expensive, inept beginner. Cost factors the widespread difference in costs between U.S. and overseas IT consulting can differ by an order of magnitude. In todays virtual world, many companies abandon the U.S. resources and outsource their IT development efforts where experienced programmers can be acquired for less than $500 per month. As of 2003, Bangalore, India; Moscow, Russia; and Eastern Europe are all courting U.S. customers. Before we see the problems that these variables introduce into any attempt to place an estimated cost on a system, lets examine the different methods that are employed to estimate the costs for an existing system. The following approaches have been tried in attempts to estimate the worth of an existing system: Compare your system with a known, similar system in this approach, common metrics are gathered for the existing system and then compared to a similar system. The common metrics might include the programming language, a subjective complexity rating for each function, and estimated productivity rates for the developers. Of course, this approach is fraught with problems, the foremost being the subjective nature of the estimates and the problem of nding a truly similar system. The subjective nature of the inputs allows you to rig the result to say anything that you desire. Use mathematical models there are numerous numerical tools that have been designed to attempt to determine how much money a
22
system may have cost. These tools include Constructive Cost Model (COCOMO) and many others, none of which are recognized as statistically valid. All of these tools have a fatal aw that makes them useless as a post hoc cost estimation tool. The major aw is the subjective nature of the estimation parameters. For example, the COCOMO tool allows the analyst to specify the following nonquantiable and subjective factors: Module size this is awed because of the widespread use of object-oriented languages such as Java. The hallmark of an object-oriented language is the ability to reuse components to greatly reduce development time. It is invalid for any tool (like COCCOMO) to assume that a Java method of a given size required any specic amount of effort to produce. Labor rates as we have noted, this is an invalid parameter because it does not consider the nonlinear relationship between experience and productivity. Effort multipliers if the analyst does not like the r esulting numbers, they need only adjust one of the subjective multipliers to receive any cost they desire. Accept external bids using this approach, the IT analyst presents a requirements document (or maybe a functional prototype with process logic specication) to a series of vendors for an estimate. To avoid the immoral act of asking for an estimate under false pretenses (because you have no intention of actually accepting the estimate), the analyst engages the third party consultancy with the promise to pay the fair market price for all estimation services if they fail to engage the consultancy to develop the system. T o ensure that the external bid is accurate, the bidder is only provided with the functional analysis document and they have no access whatsoever to the existing system. As we can see, both the comparison appr oach and math model approach have a problem with current-value dollars. The costs of developing a system three years ago might be quite different than the costs today. In summary, the only valid way to determine the real costs for an existing system is to commission a consultancy to bid on the system, seeing only the functional specications. Offering up front to pay for the estimate removes the moral issue of asking for a bid when you do not intend to accept the bid. Also, not telling the consultancy that you have no intention of using the bid ensures that you receive a competitive bid.
23
Rollback waits
Incorrect sizing of Oracle redo logs Insufcient memory allocated to log buffer area Not enough free lists assigned to tables Not using Oracle9is auto segment management Insufcient number of rollback segments Not using Oracle9is auto-UNDO management
24
Space
Invalid settings for either object space sizes or tablespace object settings (e.g., PCTINCREASE) Not using locally managed tablespaces in Oracle8 and above Overnormalized database design Incorrect amount of PCTFREE, PCTUSED settings for objects Too small database block size Incorrect sizing of rollback segments for given application transaction Not using Oracle9is auto-UNDO management Incorrect indexing scheme Incorrect initial sizing Not using LMTs
Rollback extension
2
PHYSICAL ENTITY DESIGN FOR ORACLE
INTRODUCTION
This chapter deals with the conversion of a logical schema into a physical design. As we noted in Chapter 1, Oracle provides a wealth of options for modeling data relationships. Its up to the physical designers to choose the most appropriate physical construct to represent their data. This chapter addresses the following topics and discusses how the logical data model can be converted into an Oracle physical design: Data relationship and physical design Hierarchical attribute design Object-oriented design for Oracle Using referential integrity (RI) Lets begin by looking at the different physical options for modeling logical data relationships.
26
The effective database designers job is to represent these types of relationships in a sensible way and ensure acceptable warehouse performance.
27
Large
SERVICE_HISTORY
STREET_ADDRESS
PRODUCT_NAME Size
FIRST_NAME
distributed database designer does not have free reign to introduce redundancy anywhere in the enterprise. Redundancy always carries a price, whether it is the cost of the disk storage or the cost of maintaining a parallel update scheme. Figure 2.1 shows a strategy for analyzing the consequences of data redundancy. In Figure 2.1, a boundary line lies within a range between the size of a redundant data item and the update frequency of the data item. The size of the data item relates to the disk costs associated with storing the item. The update frequency is associated with the cost of keeping the redundant data current, whether by replication techniques or by materialized views. Because the relative costs are different for each hardware conguration and for each application, this boundary may be quite different depending on the type of application. The rapid decrease in the disk storage costs designates that the size boundary is only important for large-scale redundancy. A large, frequently changing item is not a good candidate for redundancy. But large static items or small, frequently changing items are acceptable for redundancy. Small static items (e.g., gender) represent ideal candidates for redundant duplication. As we have noted, Oracle provides a wealth of options for modeling data relationships and we must understand the rami cations of each option. Lets begin with a review of one-to-many data relationships.
28
HAIR_COLOR
ZIP_CODE
PREREQ
CITY
STUDENT
COURSE
GRADE
29
Remember, the overhead of a relational database is the requirement that actual column values be repeated to establish the data relationship. Hence, if many other data items relating to hair color are required, then it is perfectly appropriate to create another entity called HAIR_COLOR. But in this case, even though a many-to-many relationship exists between HAIR_COLOR and STUDENT, HAIR_COLOR is a stand-alone data attribute, so it is unnecessary to create an additional data structure. Another example is the ZIP_CODE attribute in the STUDENT entity. At rst glance, it appears that a violation of 3NF (i.e., a transitive dependency) has occurred between CITY and ZIP_CODE. In other words, it appears that a ZIP_CODE is paired with the CITY of residence for the STUDENT. If each city has many zip codes, while each zip code r efers only to one city, it makes sense to model this as a one- to-many data relationship and theory demands creating a separate entity called ZIP. However, this is another case where the ZIP entity lacks key attributes, making it impractical to create the ZIP entity. In other words, ZIP_CODE has no associated data items. Creating a database table with only one data column would be nonsense. This example demonstrates that it is not enough to group together like items and then identify the data relationships. A practical test must be made regarding the presence of nonkey attributes within an entity class. If an entity has no attributes (i.e., the table has only one eld), the presence of the entity is nothing more than an index to the foreign key in the member entity. Therefore, both of these pseudorelationships can be removed from the E/R model. This technique not only simplies the number of entities, but it creates a better environment for an architecture. More data is logically grouped together, resulting in less SQL join overhead. Now, lets take a look at another example of overnormalization. The goal of these examples is to give you a feel for the judgments r equired for proper physical design techniques.
30
City Customer Cust_nbr Cust_name Cust_street_address Cust_city Cust_zip_code Order Order_nbr Order_date Cust_nbr Salesperson_name Cust_city cost_of_living city_mascot state_name
This model works for most transactions on an OLTP system. However, this high degree of normalization would require the joining of the CITY and STATE tables every time that address information is requested. Consider a query to display the STATE_BIRD for all orders that have been placed for birdseed. This is a cumbersome query that requires the joining of six tables. From a schema perspective, this is because the ITEM_NAME (birdseed) is separated from STATE (STATE_BIRD) by six tables:
select state_bird from state natural join city natural join customer
31
Note in the example above that we are using the Oracle9i natural join feature, which allows us to remove the join criteria from the WHERE clause. What if your goal is to simplify the data structure by removing several of the one-to-many relationships? Adding redundancy poses two problems: 1. You need additional disk space for the redundant item. 2. You need a technique to update the redundant items if they are changed. Here is a proposed solution that removes the STATE and CITY tables (Figure 2.4). Now that we have denormalized the STATE and CITY relations, we will have widespread duplication of several data items in the CUSTOMER tables, namely COST_OF_LIVING, STATE_BIRD, and so on. Of course, these data items are static, so updates are not an issue. The real benet of this denormalization is on the speed of the query. Using this same STATE_BIRD query as before, you can see how it is simplied by removing the extra tables. This removes two table joins and speeds up the whole query:
select state_bird from customer natural join order natural join quantity natural join item
32
Customer Cust_nbr Cust_name Cust_street_address Cust_city Cust_zip_code cost_of_living city_mascot state_name state_bird state_flower region_name
It is still necessary to join three tables, but this query results in a much faster, simpler query than the original ve-way table join. Of course, there are limits to massive denormalization. If you carry the denormalization concept to the extreme, you could prejoin every entity in the schema together into a single, highly redundant table. Such a table would be impossible to manage because of the high degree of redundancy.
33
a many-to-many relationship can be collapsed into a mor e compact structure, consider the relationship between a course and a student. We can assume that a student takes many courses and each course has many students. This is a classical many- to-many relationship and requires that we dene a junction table (Figure 2.5) between the base entities to establish the necessary foreign keys. Note that the junction table is called GRADE with the following contents: COURSE_ID the primary key for the COURSE table STUDENT_ID the primary key for the STUDENT table GRADE a single, nonkey attribute for both foreign keys Next, consider the question: In what context does a grade have meaning? Stating that the grade was A in CS- 101 is insufcient and stating, Joe earned an A makes no sense. Only when both the student number and the course number are associated does the grade column have meaning. Stating that Joe earned an A in CS-101 makes sense. In summary, the grade only makes sense in the context of both the student and course. So, how could we denormalize such a relationship? The answer is that we could join all three tables together and create a single, highly redundant table. The advantage to this denormalization would be that all STUDENT, COURSE, and GRADE information would be available in a single disk I/O, but the downside would be the increase volume of update overhead for DML statements. In practice, this type of many-to-many relationship would be ideal for an Oracle materialized view. Next, lets examine recursive many-to-many relationships.
34
Customer
Contains Items
1
Places
M Order M N M Item N
orders quantity
35
Business 101
Algebra 101
is-a-prerequisite-for is-a-prerequisite-for
Economics 101 Calculus 200
is-a-prerequisite-for
Business 400 Accounting 305
is-a-prerequisite-for
Multivariate Statistics 450
is-a-prerequisite-for
Linear Equations 445
is-a-prerequisite-for
is-a-prerequisite-for
Advanced Topics 470 Operations Research 499
Each part has many parts and at the same time, a part may be a subpart of a larger part. With an understanding of the nature of recursive relationships, the question becomes one of implementation: What is the best way to represent a recursive relationship in Oracle and navigate the structure? The following Oracle table denitions describe the tables for the partcomponent example:
CREATE TABLE PART( part_nbr part_name part_desc qty_on_hand number, varchar2(10), varchar2(10), number);
Look closely at the COMPONENT example. Both the has_part and is_a_part elds are foreign keys for the part_nbr eld in the PART table. Therefore, the COMPONENT table is all keyed except for the qty
36
The relationship:
Part
is-a-part-of
has-parts
Component
Is equivalent to:
Part
Part
is-a-part-of
Component
has-parts
Part
Class
Court-Case
has-parts
is-a-part
is-a
has
citing-case
cited-case
Component
Prerequisite
Case-Cite
37
eld, which tells how many parts belong in an assembly. Look at the following SQL code required to display all components in a Happy_Meal:
select part_name from part, component where has_part = 'happy meal' and part.part_nbr = component.has_part;
This type of Oracle SQL query requires joining the table against itself. Unfortunately, because all items are of the same type (e.g., PART), no real substitute exists for this type of data relationship.
38
Store Dimension STORE KEY Store Description City State District ID District Desc. Region_ID Region Desc. Regional Mgr. Level
Fact Table STORE KEY PRODUCT KEY PERIOD KEY Dollars Units Price Product Dimension PRODUCT KEY Product Desc. Brand Color Size Manufacturer Level
Store Dimension PERIOD KEY Period Desc. Year Quarter Month Day Current Flag Resolution Sequence
of normalized databases, which means prejoining tables to avoid the high performance costs of runtime SQL joins. In summary, the basic principle behind the star query schema is to introduce highly redundant data for performance reasons.
39
DATABASE ENTITIES
Course Course_Number
Enrolls
Takes
Class Schedule
ADTs Modeling class hierarchies (IS-A relationships) In Chapter 7, well examine object tables in more detail, but for now we will limit our discussion to schema-level modeling with objects. Lets take a quick look at each of these features and see how they inuence the physical design of the database.
40
05
Oracle allows us to do the same type of hierarchical grouping with their CREATE TYPE syntax.
CREATE OR REPLACE TYPE full_mailing_address_type AS OBJECT ( Street City State Zip VARCHAR2(80), VARCHAR2(80), CHAR(2), VARCHAR2(10) );
Once dened, we can treat full_mailing_address_type as a valid data type and use it to create tables.
CREATE TABLE customer ( full_name full_address ); full_name_type, full_mailing_address_type,
Now that the Oracle table is de ned, we can r eference full_ mailing_address_type in our SQL just as if it were a primitive data type:
insert into customer values ( full_name_type('ANDREW','S.','BURLESON), full_mailing_address_type ('123 1st st','Minot,ND','74635');
Next, lets select from this table. Below, we see a different output than from an ordinary SELECT statement.
41
As we can see, using ADTs allows for the physical representation of hierarchical data relationships, but this ability is not always used in relational design because of the obtuse SQL syntax that is r equired to access the data. Next, lets examine the modeling of class hierarchies in physical database design.
42
Hourly Class
Data:
wage_grade union_name
Methods: Compute_sick_time()
Vehicle
Car
Boat
Aircraft
Truck
Van
Sailboat
Yacht
Helicopter
Blimp
Lets look at another example. Consider the application of the IS- A relationship for a vehicle dealership, as shown in Figure 2.13. As you can see, the highest level in the hierarchy is VEHICLE. Beneath the VEHICLE class, you might nd CAR and BOAT subclasses. Within the CAR class, the classes could be further partitioned into classes for TRUCK and VAN. The VEHICLE class would contain the data items unique to vehicles, including the vehicle ID and the year of manufacture. The CAR class, because it
43
IS-A VEHICLE, would inherit the data items of the VEHICLE class. The CAR class might contain data items such as the number of axles and the gross weight of the vehicle. Because the VAN class IS-A CAR, which in turn IS-A VEHICLE, objects of the VAN class inherit all data items and behaviors relating to the CAR and VEHICLE classes. These types of IS-A relationships, while valid from a data modeling viewpoint, do not have a simple implementation in Oracle. Because Oracle does not support hierarchical relationships, it is impossible to directly represent the fact that a database entity has subentities. However, this type of relationship can be modeled in a relational database in two ways. The rst technique is to create subtables for car, boat, sedan, and so on. This encapsulates the data items within their respective tables, but it also creates the complication of doing unnecessary joins when retrieving a high-level item in the hierarchy. For example, the following SQL would be required to retrieve all the data items for a luxury sedan:
select vehicle.vehicle_number, car.registration_number, sedan.number_of_doors, luxury.type_of_leather_upholstery from vehicle, car, sedan, luxury where vehicle.key = car.key and car.key = sedan.key and sedan.key = luxury.key;
The second approach is to create a megatable, with each data item represented as a column (r egardless of whether it is needed by the individual row). A TYPE column could identify whether a row represents a car, van, or sailboat. In addition, the application must have intelligence to access only those columns applicable to a r ow. For example, the SAIL-SIZE column would have meaning for a sailboat row, but would be irrelevant to a sedan row.
44
Vehicle
Car
Boat
Aircraft
Yacht
Helicopter
Blimp
The IS-A relationship is best suited to the object-oriented data model, where each level in the hierarchy has associated data items and methods and inheritance and polymorphism can be used to complete the picture. It is important to note that not all classes within a generalization hierarchy will be associated with objects. These noninstantiated classes only serve the purpose of passing data denitions to the lower-level classes. The object-oriented paradigm allows for abstraction, which means that a class can exist only for the purpose of passing inherited data and behaviors to the lower-level entities. The classes VEHICLE and CAR probably would not have any concrete objects, while objects within the VAN class would inherit from the abstract VEHICLE and CAR classes. Multiple inheritance can be illustrated by the AMPHIBIOUS_VEHICLE class. Instances of this class probably would inherit data and behaviors from both the CAR and the BOAT classes (Figure 2.14). It is important to note one very big difference between one-to-many relationships and IS-A relationships. The IS-A construct does not imply any type of r ecurring association, while the one- to-m a n y a n d many-to-many relationships imply multiple occurrences of the subclasses. In the previous example, the entire class hierarchy describes vehicles associated with the ITEM entity in the overall database. The fact that a class hierarchy exists does not imply any data relationships between the classes. While one customer can place many orders, it is not true that one car can have many sedans.
45
Student first_name, MI last_name enroll_student graduate_student compute_tuirion Graduate Student IS-A IS-A Non Resident Student state_of_origin region_of_origin compute_tuition Foreign Student country_of_origin visa_expiration_date validate_visa_status IS-A
46
Line-item Item Number Item Name Quantity Price Add_line_item() Delete_line_item(); Customer Order order number order date check-credit(); check inventory(); generate-invoice(); Customer name Phone Display_cust list();
Air_cooled number_of_fins,
a schedule for periodic updates. Updates are accomplished by way of a refresh interval, which can range from instantaneous rebuilding of the materialized view to a hot refresh that occurs weekly. Oracle materialized views are quite complex in nature and require a signicant understanding to be used ef fectively. Lets now cover the required setup methods and the steps for creating materialized views and appropriate refresh intervals. In the world of database architecture, the need to dynamically create complex objects conicts with the demand for subsecond response time. Oracles answer to this dilemma is the materialized view. Database designers can use materialized views to prejoin tables, presort solution sets, and presummarize complex data warehouse information. Because this work is completed in advance, it gives end users the illusion of instantaneous response time. Materialized views are especially useful for Oracle data warehouses, where cross-tabulations often take hours to perform. This chapter explores the internals of materialized views and demonstrates how to precompute complex aggregates having Oracle dynamically rewrite SQL to reference precomputed aggregate information. This is the rst of two chapters concentrating on Oracle materialized views.
47
Create table avg_price as select item_description, avg (price) avg_price from fact group by item_description; Oracle
Create table sales_by_state as select state_abbreviation, sum (sales) sum_sales from fact group by state_abbreviation;
Prior to materialized views, DBAs using summaries spent a signicant amount of time manually identifying which ones to create; then creating, indexing, and updating them; and then advising their users about which ones to use. Figure 2.17 illustrates the process of preaggregation. The problem with manually creating summary tables is that you have to tell the end user to go to the new table. There was no Oracle mechanism to automatically rewrite the SQL to go to the precreated summary. Materialized views provide an alternate approach. Materialized views are popular in Oracle systems where performance is critical and complex SQL queries exist against large tables. Generally, we see materialized views used in two areas: 1. Aggregation 2. Replication In terms of aggregation, materialized views improve query speed by rewriting a query against the base table with a query against the pr eaggregated summary table via the following: Precalculated summaries the rollup, cube, sum, avg, min, max, count(*), count(distinct x) functions can now be used to presummarize data. Prejoined tables tables can be prejoined to substantially improve performance.
48
It is important to note that a materialized view is a form of replication. From the moment that the materialized view is created, it can become stale if any of the base tables data is changed. Hence, Oracle has incorporated their snapshot concept with materialized view technology, such that all forms of replication are considered materialized views. Below we see the Oracle create snapshot syntax. Note that we get a reply from Oracle stating Materialized View Created.
create snapshot cust_snap on customer refresh fast start with sysdate next sysdate + 1/1440 as select * from customer@remote;
49
In the above example, we used an Oracle hint to ensur e that the materialized view is referenced.
REFERENTIAL INTEGRITY
Oracle databases allow for the control of business rules with constraints. These RI rules ensure that one-to-many and many-to-many relationships are enforced within the distributed relational schema. For example, a
50
SQL Query
No
Rewrite Possible?
Yes
Rewrite Query
constraint could be used to ensure that orders are not placed for nonexistent customers or to ensure that a customer is not deleted until all of their orders have been lled. Relational systems allow control of business rules with constraints and RI rules form the backbone of relational tables. For example, in Figure 2.19, RI ensures that a row in the CUSTOMER table is not deleted if the ORDER table contains orders for that customer. It is clear that enforcing the business rule in Figure 2.19 is a real challenge. While its relatively simple to tell an Oracle system not to delete a row from its CUSTOMER table if rows for that customer exist in the
51
RI Rule = ORDER.CUST_NAME references CUSTOMER.CUST_NAME Two Options: ON DELETE RESTRICT ON DELETE CASCADE
Customers may be deleted if they have orders in the order table Customers deletion will cause all orders for the customer to delete CUST NAME CUST_STUFF
CUSTOMER
ORDER_NBR CUST_NAME ORDER_DATE ORDER 123 124 125 126 DEC IBM IBM AT&T 3-4-93 3-4-93 3-5-93 3-5-93
ORDER table, its not simple to enforce this rule when the CUSTOMER table resides in a Sybase database and the ORDER table resides within Oracle. Before most relational database-supported RI, it was the responsibility of the programmer to guarantee the maintenance of data r elationships and business rules. While this was ne for the applications, the risk came into play when ad hoc updated SQL commands were issued using Oracles SQL*Plus software. With these ad hoc update tools, the programmatic SQL could be easily bypassed, skipping the business rules and creating logical corruption. RI has earned a bad reputation in Oracle because of the overhead that is created when enforcing the business rules. In almost every case, it will be faster and more efcient to write your own rules to enforce RI instead of having Oracle do it for you. Provided that your application doesnt allow ad hoc query, its relatively easy to attach a trigger with a PL/SQL routine to enforce the RI on your behalf. In fact, this is one of the best uses of a trigger, since the DML DELETE event will not take place if the RI rules are invalid. For example, consider the foreign key constraint that protects a customer from being deleted if they have outstanding orders:
create table customer ( cust_id cust_name cust_address number, varchar(30), varchar(30);
52
Several types of constraints can be applied to Oracle tables to enforce RI, including the following constraints: CHECK constraint this constraint validates incoming columns at row insert time. For example, rather than having an application verify that all occurrences of REGION are north, south, east, or west, a CHECK constraint can be added to the table denition to ensure the validity of the region column. NOT NULL constraint this constraint is used to specify that a column may never contain a NULL value. This is enforced at SQL INSERT and UPDATE time. PRIMARY KEY constraint this constraint is used to identify the primary key for a table. This operation requires that the primary column is unique. Oracle will create a unique index on the target primary key. FOREIGN KEY constraint this is the foreign key constraint as implemented by Oracle. A foreign key constraint is only applied at SQL INSERT and DELETE times. For example, assume a one-to-many relationship between the CUSTOMER and ORDER tables, such that each CUSTOMER may place many ORDERs, yet each ORDER belongs to only one CUSTOMER. The REFERENCES constraint tells Oracle at INSERT time that the value in ORDER.CUST_NUM must match the CUSTOMER.CUST_NUM in the customer row, thereby ensuring that a valid customer exists before the order row is added. At SQL DELETE time, the REFERENCES constraint can be used to ensure that a CUSTOMER is not deleted if rows still exist in the ORDER table. UNIQUE constraint this constraint is used to ensure that all column values within a table never contain a duplicate entry. Note the distinction between the UNIQUE and PRIMARY KEY constraint. While both of these constraints create a unique index, a table may only contain one PRIMARY KEY constraint column, but it may have many UNIQUE constraints on other columns.
53
Note: It is a critical point that the RI cannot be maintained in a denormalized schema unless materialized views are built against the 3NF representation of the data.
CONCLUSION
This chapter has dealt with schema- level physical design for Oracle databases and looked into the issues relating to the physical modeling of data relationships. We explored one-to-many, many-to-many, and recursive relationships and object-oriented constructs. The main points of this chapter include: Normalization is key proper denormalization (prejoining tables) is critical to high-speed Oracle performance. Multiple representations are available for cases where the data must appear in both denormalized and normalized forms, materialized views can be used to synchronize the data. Use object-oriented constructs when appropriate Oracle provides a wealth of object-oriented database constructs, such as ADTs, that can be used to simplify implementation. We are now ready to explore the physical design issues relating to hardware and see how the physical database designer must consider the environment for the database.
3
ORACLE HARDWARE DESIGN
INTRODUCTION
This chapter is devoted to some of the hardware design issues associated with implementing a successful Oracle database. The discussion will focus on the physical design of the database and how it r elates to available hardware. Oracle9i provides a wealth of features and tools that allow the designer to tailor database response for the needs of a particular application. These are also explored. The main topics of this chapter include: Planning the server architecture Hardware design and central processing unit (CPU) issues Hardware design and RAM issues Oracle network design Server design and disk allocation Designing a server environment entails making decisions about the basic hardware requirements, as well as selecting the mor e advanced database conguration methods necessary to effectively interact with the available hardware. The designer needs to understand the demands that the server places on CPU resources, which is discussed in detail in this chapter. The discussion then moves to a consideration of the demands that the server places on RAM resources. The focus then shifts to server design as it relates to general memory issues. The chapter concludes with a detailed look at designing the overall network, including connectivity concerns. Although Oracle offers many performance-tuning techniques, you cant tune away a poor database design especially a poor architectural design. So, it is imperative that the Oracle database designer understand (fr om
55
56
the inception of the project) how to create robust Oracle data architectures that can retrieve information as rapidly as possible while pr eserving maintainability and extensibility. If you strip away all the complex methodology and jargon surrounding the Oracle database, one simple factor remains disk I/O. Disk I/O is the most expensive Oracle database operation. Oracle design professionals should always remember to design with data architectures to retrieve the desired information with a minimal amount of disk access. This section shares some of the tricks I use to ensure Oracle hardware architecture designs perform at optimal levels while making a design that is easy to maintain and extend: Use RAM data caching you must be aware that Oracle9i allows large memory regions to cache frequently referenced row information. The caching of frequently referenced information should be a major design goal primarily because RAM access is two orders of magnitude (more than 10,000 times) faster than row access from disk. The larger the Oracle data block buffer cache, the faster the SQL queries will execute. The size of the RAM data buf fers will have a direct impact on Oracle performance and all systems run fastest when fully cached in the data buffers. Buy fast processors the CPU speed of the Oracle database server has a direct impact on performance. High-performance 64-bit CPUs will often perform 10 times faster than 32-bit processors. The 64-bit processors are available on all major platforms and include: Windows operating system (OS) Intel Itanium processor HP/UX PA-8000 processor Solaris operating environment 500-MHz Ultrasparc-iie processor IBM AIX OS RS/6000 PowerPC processor Use a 64-bit version of Oracle it is highly recommended that Oracle systems exist on a dedicated database server with a 64-bit CPU architecture and a 64-bit version of Oracle. The 64-bit version of Oracle lets you create large SGA regions and large projects commonly require more than 20 gigabytes (GB) of RAM data buffers. A serious shortcoming of 32-bit Oracle is the 1.7 GB size limitation for the SGA. Design for faster SGA access one of the foremost reasons stored procedures and triggers function faster than traditional code is related to the Oracle SGA. After a procedure has been loaded into the shared pool of the SGA, it remains until it is paged out of memory to make room for other stored procedures. Items are paged out based on a least recently used (LRU) algorithm. Once loaded into the RAM memory of the shared pool, procedures will
57
execute quickly the trick is to prevent pool thrashing, as many procedures compete for a limited amount of shared-pool memory. Stored procedures load once into the shared pool and remain there unless they become paged out. Subsequent executions of the stored procedure are far faster than executions of external code. One of the trademarks of a superior Oracle designer is the ability to create an overall architecture that is robust, maintainable, and efcient. Todays Oracle design professionals are required to design systems that may support thousands of transactions per second while at the same time delivering subsecond response time, easy maintenance, and extensibility. With a thorough understanding of Oracle9i database features and the help of the tips presented in this chapter, you can build an appropriate data model architecture that supports the requirements of end users. Lets start with a review of important issues for planning the Oracle server environment.
58
UNIX servers handle tasks according to their inter nal dispatching priority. UNIX OS tasks will obviously have a higher dispatching priority. CPU overload is typically indicated by high values in the vmstat run queue column. If the run queue value exceeds the number of CPUs available to the server, some tasks may await completion. There are several options available for managing CPU shortages: Add additional processors. Reduce server load. Turn off Oracle parallel query. Replace the standard Oracle listener with the multi-threaded server (MTS). Alter task dispatching priorities. Upgrade the server. Symmetric multiprocessor congurations for Oracle database servers are usually expandable. Additional processors can be added at any time. The new CPUs are immediately made available to the Oracle database by the processor architecture. The disadvantage of adding processors is the high cost, which is often greater than the cost of a new server. Reduced productivity due to increased response time can be compared with the cost of additional processors by performing a cost-benet analysis to determine the feasibility of adding more processors. CPU overloads can be sporadic and complicate justifying additional processors. Overloads are often transient or momentary. These types of overloads heavily burden the server at certain times while leaving pr ocessing resources only partially utilized at other times. A load balancing analysis can be performed to ensure that batch-oriented tasks are sent to the server at nonpeak hours. The nature of the individual business is a factor here. For example, if the business is conducted primarily online, the response time when the online users are active is the only important one. The fact that the server may be idle late at night has no bearing on the decision to add CPUs.
59
The dbms_job utility The UNIX cron utility A transaction processing monitor such as the Tuxedo application Oracle Concurrent Manager (for Oracle Applications) Most OSs allow the root user to change the task dispatching priority. In general, the online database background tasks are given greater priority (a smaller priority value), while less critical batch processes are assigned less priority (a higher priority value). Note: In the long run, its not a satisfactory solution to alter the default dispatching priorities. The default priorities should be changed only in emergency situations.
60
available memory. For example, even though the CPU seems to be 99 percent idle, the RAM is clearly overloaded in Table 3.1.
61
The HP/UX vmstat list below should be carefully reviewed. The scan rate is the far right-hand column. We can see the value of sr rising steadily as the paging operation prepares to page-in. RAM on the Oracle server is exceeded as the sr value peaks and the page-in operation begins.
root> vmstat procs r b w 3 0 0 3 0 0 3 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1 0 0 2 memory avm free 144020 12778 144020 12737 144020 12360 142084 12360 142084 12360 140900 12360 140900 12360 140900 12204 137654 12204 re 17 15 9 5 3 1 0 0 0 at 9 0 0 0 0 0 0 0 0 pi 0 1 1 3 8 10 9 3 0 page po 14 34 46 17 0 0 0 0 0 fr 29 4 2 0 0 0 0 0 0 de 0 0 0 0 0 0 0 0 0 sr 3 8 13 21 8 0 0 0 0
The most important columns in this report are: r = the runqueue value showing the number of processes in the CPU dispatcher pi = the page-in values showing OS page-in operations from the swap disk po = the number of page-out operations of the OS to swap disk, in anticipation of a possible page-in Before looking further into excess memory demands on the server, we need to determine how much memory is available on the server. Once we know the amount of RAM on our server, we can investigate the Oracle servers RAM and swap disk usage. Whenever the memory demands of the server exceed the amount of RAM, the virtual memory facility is invoked. Virtual memory moves segments of RAM onto the special swap disk. The swap disk holds excess RAM memory contents. Its parameters are dened by the system administrator. It is common practice for the virtual memory system to page-out memory segments and this does not indicate a memory problem. However, a page-in operation does indicate that the server has exceeded the amount of available RAM and that memory segments are being recalled from the swap disk. During swapping (page-in), data usually stored in RAM memory is read from the swap disk back into memory. This slows down a server. The solution to a page-in problem on an Oracle database server involves: Smaller SGA the demand for RAM is reduced by making the SGA smaller. The size of the SGA was decreased in Oracle8i and
62
earlier versions by reducing the db_block_buffers. Oracle9i re d u c e s i t b y db_cache_size , sga_max_size , db_xK_cache_size , shared_pool_size , or java_pool_size init.ora parameters. This results in more RAM memory and adds additional RAM memory to the server. (Remember that some 32-bit versions of Oracle cannot use more than 1.7 GB of RAM.) Reduce RAM demand the amount of RAM consumed by a database server can be decreased by reducing the demands on PGA. The amount of RAM allocated to each users PGA can be greatly increased by Oracle parameters such as sort_area_size. A memory-bound database server is always subject to paging from the swap disk. The vmstat utility displays paging as the po and pi columns of vmstat. The following diagram indicates that the database server is burdened by nine page-out and ve page-in operations. The page-in operations show that the server is suffering excessive memory requests.
root> vmstat 1 2 kthr memory page faults cpu ----- ----------- ----------------------- ------------ ---------r b avm fre re pi po fr sr cy in sy cs us sy id wa 0 0 218094 166 0 4 0 4 16 0 202 301 211 14 19 45 22 0 0 218094 166 0 5 9 4 14 0 202 301 211 14 19 45 22
To summarize, page-out operations are normal within virtual memory, but page-in operations indicate excessive RAM demands on the server. To derive optimum per formance for a database application, the designer must carefully weigh the issues of server CPU demand and server demand on RAM resources. We have focused on the techniques and tools that the designer must employ to effectively integrate server design with actual memory consumption. The remainder of the chapter considers how to design the network for seamless interaction with the user, as well as how to design the network for the best possible level of performance. Now that we understand the basics of server design for RAM, lets explore network design issues for Oracle databases.
63
stack for transmission. The protocol stack in turn creates a packet and sends it over the network. In reality, Oracle Net can do little to improve performance, with a few minor exceptions since network trafc and tuning is addressed outside of the Oracle environment. Oracle DBAs do have control over the size and frequency of network packets. Many tools are available to change the packet size and the frequency that packets are sent over the network. For example, larger amounts of data can be sent over the network less often by changing the refresh interval for a snapshot. The remainder of the section is devoted to the issues involved with successful network design. The tools available to the designer to tune performance are emphasized. Understanding the use of these tools will facilitate the optimum design of the desired conguration. The following material is divided into two major sections: 1. Optimizing Oracle Net conguration 2. Other Oracle features that affect network performance The rst section discusses the tuning parameters that are used with the Oracle Net layer. The second explores some salient Oracle features that can be used to ne-tune network performance.
64
Oracle Net waits for the buffer to ll. A protocol.ora le that species tcp.nodelay will end delays in the buf fer ushing process for all Transmission Control Protocol/Internet Protocol (TCP/IP) implementations. This parameter may be applied to both the client and the server. The protocol.ora statement is:
tcp.nodelay = yes
All requests are sent immediately when this parameter is speci ed. This parameter can cause the network to run slower in some cases. For this reason, Oracle recommends that tcp.nodelay be used only when encountering TCP timeouts. Because network packets will be transmitted more frequently between the client and server, network trafc can increase. Nevertheless, using tcp.nodelay can offer a huge performance improvement between database servers during times of high-volume trafc.
The automatic_ipc parameter should only be used on the database server when an Oracle Net connection must be established with the local database. The parameter should be set to off if no local database connections are needed. All Oracle Net clients should take advantage of this setting to improve performance.
This parameter affects the amount of CPU consumed on the Oracle Net client. It is a client-only parameter. Some general rules for using this parameter are:
65
The higher the value of break_poll_skip, the less frequently Ctrl-C is checked and the less CPU overhead used. Conversely, the lower the value of break_poll_skip, the more frequently Ctrl-C is checked and the more CPU overhead used. This parameter only functions on servers that support inband breaks and is only useful on an Oracle Net client sqlnet.ora le.
66
The SDU and TDU parameters may be set to smaller values for users who connect over modem lines because of the frequent resends that can occur over dial-up connections. If the MTS is used, the mts_dispatchers must also be set with the proper MTU and TDU conguration. An example of the parameters on a Token Ring network with a MTU of 4202 B is: listener.ora
SID_LIST_LISTENER = (SID_LIST = (SID_DESC = (SDU = 4202) (TDU = 4202) (SID_NAME = ORCL) (GLOBAL_DBNAME = ORCL.WORLD) ) )
tnsnames.ora
ORCL.WORLD = (DESCRIPTION = (SDU=4202) (TDU=4202) (ADDRESS = (PROTOCOL = TCP) (HOST = fu.bar) (PORT = 1521) ) (CONNECT_DATA = (SID = ORCL)) )
The Oracle8i database automatically registers instances in the listener.ora le unless one of the following is done: Implement the MTS and dene the mts_dispatchers in your init.ora le:
MTS_DISPATCHERS="(DESCRIPTION=(SDU=8192)(TDU=8192)\ ADDRESS=(PARTIAL=TRUE)(PROTOCOL=TCP)(HOST=supsund3)))\ (DISPATCHERS=1)"
67
Use service_name=global_dbname in the Connect_Data section of the tnsnames.ora le, where global_dbname is congured in listener.ora. Note: Global_dbname disables Transparent Application Failover (TAF) because it does not support it. The Oracle Net Administrators Guide provides more information under Conguring Transparent Application Failover. Do not use automatic service r egistration. Set the init.ora parameter local_listener to use a different TCP port than the one dened in your listener.ora le.
The queuesize Parameter in listener.ora If it is expected that the listener will receive large numbers of requests for connection, a queue may be specied for the process. This enables the listener to handle larger numbers of simultaneous connection requests. The number of requests the listener can store while Oracle works to establish a connection is specied by the queuesize parameter. The value of this parameter should be equivalent to the number of expected simultaneous connections. Below is an example of the queuesize parameter in the listener.ora le:
LISTENER = (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (HOST = marvin) (PORT = 1521) (QUEUESIZE = 32) ) )
Use of queuesize can be disadvantageous since more resources and memory is used. The parameter pr eallocates resources for anticipated connection requests. For this reason, if high-volume connections into a dedicated listener are anticipated, it may be benecial to implement the MTS and use prespawned Oracle connections.
Connection Pooling and Network Performance A MTS dispatcher requires fewer physical network connections. Connection pooling allows the administrator to take full advantage of this. This
68
resource utilization feature is achieved by sharing a dispatchers set of connections among multiple client processes. Connection pooling reuses physical connections. Older connections are made available for incoming clients while a logical session with the previous idle connection is maintained. Connections that have been idle for a specied period of time are temporarily released by a timeout mechanism. The idle connection can be used by another pr ocess until the previous client resumes work. At that time another physical connection is established with the dispatcher. Connection pooling is disabled on both incoming and outgoing network connections by default. To enable connection pooling, add the POOL argument to the mts_dispatchers parameter in the init.ora le.
MTS_DISPATCHERS = "(PROTOCOL=TCP)(DISPATCHERS=3)(POOL=3)"
Connection pooling is enabled for both incoming and outgoing network connections whenever a number is specied. The number sets the timeout in ticks for both types of network connections. To enable connection pooling for both incoming and outgoing networks while using the Oracle Net default timeout, set the POOL argument to ON, YES, TRUE, or BOTH.
MTS_DISPATCHERS = "(PROTOCOL=TCP)(DISPATCHERS=3)(POOL=ON)"
The POOL argument IN or OUT enables connection pooling for incoming or outgoing network connections respectively. The Oracle Net default timeout will be used.
MTS_DISPATCHERS = "(PROTOCOL=TCP)(DISPATCHERS=3)(POOL=IN)"
In practical administration, connection pooling is used rarely unless the database server is overwhelmed with incoming Oracle Net requests.
69
Databases that are subject to high trafc may have performance problems when ODBC is used. Many Oracle applications experience increased overhead because connecting via ODBC is less ef cient than using a native API call to the database. It is therefore recommended that ODBC be replaced with a native communications tool such as the Oracle call interface (OCI). ODBC generally works ne for occasional database queries but is too slow to be a per manent xture in most production applications.
70
2002-11-07 2002-11-07 2002-11-07 2002-11-07 2002-11-07 2002-11-07 2002-11-07 2002-11-07 2002-11-07 2002-11-07 2002-11-07
CONCLUSION
We have analyzed the tools and techniques that the designer must use to build a successful database environment. The designer must balance the requirements imposed by the hardware with a particular conguration of the database to adequately fulll the purpose of the application, ensure the most efcient operation, and maximize the level of per formance. Careful attention to the tuning methods described will reward the database designer with an efcient design that meets the needs of the client, while ensuring the optimum utilization of the resources available to him. Next, lets look at Oracle design for the instance. The running instance is governed by the SGA and proper SGA design is a critical aspect of Oracle design.
71
Rollback waits
Incorrect sizing of Oracle redo logs Insufcient memory allocated to log buffer area Not enough free lists assigned to tables Not using Oracle9is auto segment management Insufcient number of rollback segments Not using Oracle9is auto-UNDO management Not separating tables and accompanying indexes into different tablespaces on different physical drives Not placing SYSTEM tablespace on little accessed physical drive Placing tablespace used for disk sort activity on RAID5 drive or heavily accessed physical volume Too many long table scans invalid indexing scheme Not enough RAM devoted to buffer cache memory area Invalid object placement using Oracle8s KEEP and RECYCLE buffer caches Not keeping small lookup tables in cache using CACHE table parameter
I/O
72
Rollback extension
4
ORACLE INSTANCE DESIGN
INTRODUCTION
The design of the Oracle instance can have a pr ofound impact of the performance of the database. Starting in Oracle9i, you have the option of dynamically changing the SGA regions, but you still must design the instance conguration to handle your systems most common processing loads. Because of the huge changes between Oracle releases, we will explore the design issues for Oracle8i, Oracle9i, and then explore the new Oracle Database 10g features for automatic SGA memory management. If you are using Oracle9i, there are three ways to self-tune Oracle9i: Normal scheduled reconguration a bimodal instance that performs OLTP and DSS during regular hours will benet from a scheduled task to recongure the SGA and PGA. Trend-based dynamic reconguration you can use Statspack to predict those times when the processing characteristics change and use the dbms_job package to re ad hoc SGA and PGA changes. Reactive reconguration just as Oracle9i dynamically redistributes RAM memory for tasks within the pga_aggregate_target region, the Oracle DBA can write scripts that steal RAM from an underutilized area and reallocate these RAM pages to another RAM area. To illustrate, here is a script that can be used to change the SGA conguration when your processing requirements change:
73
74
Lets review some guidelines for sizing the SGA and PGA regions.
75
2 megabytes (MB) RAM session overhead plus sort_area_size plus hash_area_size Oracle SGA RAM this is determined by the Oracle parameter settings. The total is easily found by either the show sga command or the value of the sga_memory_max parameter. We should subtract 20 percent from the total available RAM to allow for Windows overhead. Windows uses RAM resources even when idle, and the 20 percent deduction is necessary to get the real free RAM on an idle server. Once the amount of RAM on the server is known, we will be in a position to size the Oracle database for RAM usage. First, we need to know the high water mark (HWM) of Oracle connections. As noted previously, each session connected to the Windows server requires a memory region for the PGA, unless Oracles MTS architecture or pga_aggregate_target is utilized. The HWM of connected Oracle sessions can be determined in several ways. One popular method uses Oracle login and logof f system-level triggers to record sessions in a statistics table. Another method uses Oracle Statspack to display the values from the stats$sysstat table or the v$resource_limit view (only after release 8.1.7, because of a bug).
76
The alter session command instructs UNIX to expand the PGA sort area as the sort requires. If external PGA RAM is used, Oracle issues the malloc() command, creating a RAM sort area. The RAM sort area is not allocated until the retrieval from the databases has been completed and the memory only exists for the duration of the sort. In this way, the RAM is only allocated when Oracle needs it and the memory demands on the server are reduced.
A quick dictionary query ( pga_size_each.sql) against the v$parameter view will yield the correct value for each PGA RAM region size.
set pages 999; column pga_size format 999,999,999 select 2048576+a.value+b.value pga_size
77
The data dictionary query output shows that the Oracle PGA will use 3.6 MB of RAM memory for each connected Oracle session.
PGA_SIZE -----------3,621,440
If we now multiply the number of connected users by the PGA demands for each user, we will know exactly how much RAM should be reserved for connected sessions. Alternatively, we could issue an SQL statement to obtain the same result. The script for such a statement is shown below.
78
select &hwm*(2048576+a.value+b.value) pga_size from v$parameter a, v$parameter b where a.name = 'sort_area_size' and b.name = 'hash_area_size' ;
Running the script, we see that we are prompted for the HWM. We will assume that the HWM of connected sessions to the Oracle database server is 100. Oracle will do the math and display the amount of RAM to reserve for Oracle connections.
SQL> @pga_size Enter the high-water mark of connected users: 100 old new 2: 2: &hwm*(2048576+a.value+b.value) pga_size 100*(2048576+a.value+b.value) pga_size
PGA_SIZE -----------362,144,000
Returning to our example Windows server, we are ready to calculate the optimum SGA size. Multiplying 100 by the amount needed for each PGA region (3.62 MB) and adding the 2 MB PGA overhead, gives us the total PGA size of 364 MB. The maximum size for the SGA is determined by subtracting the total PGA and the OS overhead from the total RAM on the server. Here is a summary: Total RAM on Windows server Less: Total PGA regions for 100 users: RAM reserved for Windows (20 percent) Maximum SGA size 1250 MB 364 MB 250 MB 636 MB
This leaves 636 MB of free memory for the SGA. Therefore, the RAM allocated to the data buffers should be adjusted to make the SGA size less than 636 MB. If the SGA size is greater than 636 MB, the server will begin to page RAM, impairing the performance of the entire server. We
79
also see that the total Oracle RAM is 1000 MB, equivalent to the total PGA plus the total SGA. Next, lets review the SGA instance parameters and see how the proper design is used to optimize performance.
80
the sort and hash areas with PGA RAM. The AMM uses real-time workload data from automatic workload repository (AWR) and changes the sizes of the shared pool and data buffers according to the current workload. sga_target just like pga_aggregate_target, setting this parameter allows the Oracle SGA to dynamically change itself as processing needs change. The size of the shared_pool_size and db_cache_size will adjust based upon current and historical processing requirements. As we see, the SGA design is highly dependent on the version of Oracle that you are running. Next lets examine the design of the shared pool region.
81
As we can see, the addition of the host variable makes the SQL statement reusable and reduces the time spent in the library cache. This improves the overall throughput and performance of the SQL statement. In severe cases of nonreusable SQL, many Oracle DBAs will issue the Alter Database Flush Shared Pool command periodically to remove all of the nonreusable SQL and improve the performance of SQL statements within the library cache.
82
psuedotable that keeps information about library cache activity. The table has three relevant columns namespace, pins, and r eloads. The namespace column indicates whether the measurement is for the SQL area, a table or procedure, a package body, or a trigger. The pins column counts the number of times an item in the library cache is executed. The reloads column counts the number of times the parsed representation did not exist in the library cache, forcing Oracle to allocate the private SQL areas in order to parse and execute the statement. Listing 4.1 is an example of a SQL*Plus query to interr ogate the V$LIBRARYCACHE table and retrieve the necessary performance information. When we run this script, we see all of the salient ar eas within the library cache, as shown below.
Low data dictionary object hit ratio Rollup by hour Data Data Dictionary Data Object Dictionary Cache Dictionary Hit Yr. Mo Dy Hr. PARAMETER Gets Misses Usage Ratio -------------- ------------------ -------- ---------- ---------- -----2003-08-23 12 dc_histogram_defs 2,994 980 600 67 2003-08-23 23 dc_histogram_defs 2,567 956 432 63
Again, we must always check to see if any component of the shared pool needs to be increased. Next, lets take a look at how instancewide sort operations affect the performance of the Oracle database.
83
column c2 heading "Cache Misses|While Executing" column c3 heading "Library Cache|Miss Ratio" break on mydate skip 2; select to_char(snap_time,'yyyy-mm-dd HH24') sum(new.pins-old.pins) sum(new.reloads-old.reloads) sum(new.reloads-old.reloads)/ sum(new.pins-old.pins) from stats$librarycache old, stats$librarycache new, stats$snapshot where snap_time > sysdate-&1 and new.snap_id = sn.snap_id and old.snap_id = new.snap_id-1 and old.namespace = new.namespace having sum(new.reloads-old.reloads)/ sum(new.pins-old.pins) > .05 group by to_char(snap_time,'yyyy-mm-dd HH24') ; sn mydate, c1, c2,
library_cache_miss_ratio
84
Listing 4.2 Using the Oracle Statspack Utility to Monitor a Too-Small Shared Pool
-- prompt -- prompt *********************************************************** -- prompt -- prompt -- prompt ttitle 'High event waits|Check for shared pool contention' set pages 999; set lines 80; column mydate heading 'Yr. column event column waits column secs_waited column avg_wait_secs Mo Dy Hr' format a13; format a30; format 99,999,999; format 999,999,999; format 999,999; Excessive event waits indicate shared pool contention -- prompt ***********************************************************
break on to_char(snap_time,'yyyy-mm-dd') skip 1; select to_char(snap_time,'yyyy-mm-dd HH24') e.event, e.total_waits - nvl(b.total_waits,0) nvl((e.total_waits - nvl(b.total_waits,0)),0) from stats$system_event b, stats$system_event e, stats$snapshot where snap_time > sysdate-&1 and e.snap_id = sn.snap_id and b.snap_id = e.snap_id-1 and b.event = e.event and ( e.event like 'SQL*Net%' or e.event in ( sn waits, avg_wait_secs ((e.time_waited_micro - nvl(b.time_waited_micro,0))/100) / mydate,
85
Listing 4.2 Using the Oracle Statspack Utility to Monitor a Too-Small Shared Pool (Continued)
'latch free', 'enqueue', 'LGWR wait for redo copy', 'buffer busy waits' ) ) and e.total_waits - b.total_waits and e.time_waited_micro - b.time_waited_micro > 100 ; > 100
------------- ------------------------------ ----------- ------------2003-08-26 03 SQL*Net message to client 2003-08-26 03 SQL*Net message from client 2003-08-26 03 SQL*Net more data from client 2003-08-26 03 SQL*Net break/reset to client 2003-08-26 04 latch free 2003-08-26 04 enqueue 2003-08-26 04 buffer busy waits 2003-08-26 04 LGWR wait for redo copy 2003-08-26 04 SQL*Net message to client 2003-08-26 04 SQL*Net more data to client 2003-08-26 04 SQL*Net message from client 2003-08-26 04 SQL*Net more data from client 2003-08-26 04 SQL*Net break/reset to client 2003-08-26 05 latch free 2003-08-26 05 enqueue 2003-08-26 05 LGWR wait for redo copy 2003-08-26 05 SQL*Net message to client 2003-08-26 05 SQL*Net more data to client 2003-08-26 05 SQL*Net message from client
86
In Oracle9i, release 2, the v$shared_pool_advice shows the marginal differences in SQL parses as the shared pool changes in size from 10 percent of the current value to 200 percent of the current value.
87
The Oracle documentation contains a complete description for the set up and use of shared pool advice, which is simple to congure. Once it is installed, you can run a simple script to query the v$shared_pool_ advice view and see the marginal changes in SQL parses for different shared_pool sizes (Listing 4.4). Here we see the statistics for the shar ed pool in a range from 50 percent of the current size to 200 percent of the current size. These statistics can give you a gr eat idea about the pr oper size for the shared_pool_size. If you are automating the SGA region sizes with automated alter system commands, creating this output and writing a program to interpret the results is a great way to ensure that the shared pool and library cache always have enough RAM. Next lets examine the most important SGA component the internal data buffers.
select name, block_size, (1-(physical_reads/ decode(db_block_gets+consistent_gets, 0, .001, db_block_gets+consistent_gets)))*100 cache_hit_ratio from v$buffer_pool_statistics;
Here, we see the output from this script. Note that the names of the sized block buf fers remain DEFAULT and you must select the
88
(sec) Factor Object Hits 1 1 1 1 1 1 1 1 1 1 135,756,032 135,756,101 135,756,149 135,756,253 135,756,842 135,756,842 135,756,842 135,756,842 135,756,842 135,756,842
---------- ---------- -------- ----------- ------- ------ -----------20839 1459645 28140 1459645 35447 1459645 43028 1459645 46755 1459646 46755 1459646 46755 1459646 46755 1459646 46755 1459646 46755 1459646
89
block_size column to differentiate between the buffers. Here we see all seven data buffers.
NAME ----------DEFAULT RECYCLE KEEP DEFAULT DEFAULT DEFAULT DEFAULT BLOCK_SIZE CACHE_HIT_RATIO ---------- --------------32,767 16,384 16,384 16,384 4,096 8,192 2,048 .97 .61 1.00 .92 .99 .98 .86
This report is not very useful because the v$sysstat view only shows averages since the instance was started. To perform self-tuning of the data buffers, we can use Oracles Statspack utility to measure the DBHRs every hour. It would be ideal if you could create one buffer for each database page, ensuring that Oracle would read each block only once. With Oracle8i and the very large memory features, its now possible to specify a data buffer thats large enough to hold an entire multigigabyte database, but most large databases do not have enough RAM to allow for the full caching of data pages. In Oracle8i, we have three buffer pools for holding data blocks: 1. DEFAULT pool used for all data blocks that are not marked for the KEEP or RECYCLE pools 2. KEEP pool reserved for tables and indexes that ar e used frequently 3. RECYCLE pool reserved for data blocks that are read when performing large FTSs Because most Oracle databases do not have enough RAM to cache the whole database, the data buffers manage the data blocks to reduce disk I/O. Oracle utilizes a LRU algorithm to determine which database pages are to be ushed from memory. As I mentioned earlier, the measure of the effectiveness of the data buffer is called the DBHR. This ratio computes the likelihood that a data block is present in the data buffer when the block is requested. The more data blocks that are found in the buffer, the higher the DBHR. Oracle recommends that all databases exhibit a DBHR of at least 90 percent.
90
Its important to note that the DBHR is an elapsed-time measurement. If you use the Oracle Statspack utility to compute the DBHR over short intervals (every ve minutes), you will see that the buffer hit ratio varies from 50 to 100 percent, depending upon the type of SQL requests that are being processed. Many Oracle shops will keep their buffer hit ratio information in the Statspack tables and plot it to show trends in the effectiveness of the data buffer to reduce I/O. Figure 4.1 shows an example of a plot of Statspack data for the DBHR. The predictive models for Oracle RAM ar eas began with the v$db_cache_advice utility in Oracle9 i. T h e n e w v$db_cache_advice view is similar to an Oracle7 utility that also predicted the benet of adding data buffers. The Oracle7 utility used the x$kcbrbh view to track buffer hits and the x$kcbcbh view to track buffer misses. Oracle9i, release 2 now has three predictive utilities: 1. PGA advice Oracle9i has introduced a new advisory utility dubbed v$pga_target_advice. This utility will show the marginal changes in optimal, one-pass, and multipass PGA execution for different sizes of pga_aggregate_target, ranging from 10 to 200 percent of the current value. 2. Shared pool advice this advisory functionality has been extended i n O r a c l e 9 i, r e l e a s e 2 t o i n c l u d e a n e w a d v i c e c a l l e d v$shared_pool_advice. There is talk to expanding the advice facility to all SGA RAM areas in future releases of Oracle.
91
3. Data cache advice the v$db_cache_advice utility shows the marginal changes in physical data block reads for different sizes of db_cache_size. Bear in mind that the data from Statspack can provide similar data as v$db_cache_advice. Most Oracle tuning professionals use Statspack and v$db_cache_advice to monitor the effectiveness of their data buffers. These advisory utilities are extremely important for the Oracle DBA who must adjust the sizes of the RAM areas to meet current processing demands.
Using v$db_cache_advice
The following query can be used to perform the cache advice function, once the db_cache_advice has been enabled and the database has run long enough to give representative results.
-- *********************************************************** -- Display cache advice -- *********************************************************** column column column column c1 c2 c3 c4 heading heading heading heading 'Cache Size (meg)' 'Buffers' 'Estd Phys|Read Factor' 'Estd Phys| Reads' format format format format 999,999,999,999 999,999,999 999.90 999,999,999
select size_for_estimate c1, buffers_for_estimate c2, estd_physical_read_factor c3, estd_physical_reads c4 from v$db_cache_advice where name = 'DEFAULT' and block_size = (SELECT value FROM V$PARAMETER WHERE name = 'db_block_size') and advice_status = 'ON';
The output from the script is shown below. Note that the values range from 10 percent of the current size to double the current size of the db_cache_size.
92
Estd Phys Estd Phys Cache Size (meg) Buffers Read Factor Reads ---------------- --------- ----------- -----------30 3,802 18.70 192,317,943 <== 10% size 60 7,604 12.83 131,949,536 91 11,406 7.38 75,865,861 121 15,208 4.97 51,111,658 152 19,010 3.64 37,460,786 182 22,812 2.50 25,668,196 212 26,614 1.74 17,850,847 243 30,416 1.33 13,720,149 273 34,218 1.13 11,583,180 304 38,020 1.00 10,282,475 Current Size 334 41,822 .93 9,515,878 364 45,624 .87 8,909,026 395 49,426 .83 8,495,039 424 53,228 .79 8,116,496 456 57,030 .76 7,824,764 486 60,832 .74 7,563,180 517 64,634 .71 7,311,729 547 68,436 .69 7,104,280 577 72,238 .67 6,895,122 608 76,040 .66 6,739,731 <== 2x size
From this listing, we see that increasing the db_cache_size from 304 MB to 334 MB would result in approximately 700,000 less physical reads. These advisory utilities are important for Oracle9i DBAs who must adjust their SGA regions to meet current processing demands. Remember, SGA tuning is an iterative process and busy shops continually monitor and adjust the size of their data cache, PGA, and shared pool.
93
(session logical reads) It should be noted that the for mula for calculating the hit ratio in Oracle7 and Oracle8 does not include direct block reads. Direct block reads become a separate statistic in Oracle8i. It is important to realize that the DBHR is only one small part of Oracle tuning. You should also use Statspack, interrogate system wait events, and tune your SQL for optimal execution plans. The hit ratio for Oracle8i can be gathered from the v$ views, as shown below. However, the value is not very useful because it shows the total buffer hit ratio since the beginning of the instance.
select 1 - ((a.value (b.value))/d.value) "Cache Hit Ratio" from v$sysstat a, v$sysstat b, v$sysstat d where a.name='physical reads' and b.name='physical reads direct' and d.name='session logical reads';
Many novice DBAs make the mistake of using the DBHR from the v$ views. The v$buffer_pool_statistics view does contain the accumulated values for data buffer pool usage, but computing the DBHR from the v$ tables only provides the average since the database was started. For the DBA to determine how well the buffer pools are performing, it is necessary to measure the hit ratio at more frequent intervals. Calculating the DBHR for Oracle8 and beyond is more complicated than earlier versions, but the results enable the DBA to achieve a higher level of tuning than was previously possible. In the next section, we will look at the wealth of infor mation that Statspack can provide for tracking buffer pool utilization and computing the DBHR.
94
95
set pages 9999; column column column column logical_reads format 999,999,999 phys_reads format 999,999,999 phys_writes format 999,999,999 "BUFFER HIT RATIO" format 999
select to_char(snap_time,'yyyy-mm-dd HH24'), a.value + b.value "logical_reads", c.value "phys_reads", d.value "phys_writes", round(100 * (((a.value-e.value)+(b.value-f.value))-(c.valueg.value)) / (a.value-e.value)+(b.value-f.v value))) "BUFFER HIT RATIO" from perfstat.stats$sysstat a, perfstat.stats$sysstat b, perfstat.stats$sysstat c, perfstat.stats$sysstat d, perfstat.stats$sysstat e, perfstat.stats$sysstat f, perfstat.stats$sysstat g, perfstat.stats$snapshot sn where a.snap_id = sn.snap_id and b.snap_id = sn.snap_id and c.snap_id = sn.snap_id and d.snap_id = sn.snap_id and e.snap_id = sn.snap_id-1 and f.snap_id = sn.snap_id-1
96
yr.
mo dy Hr BUFFER_POOL_NAME
------------- -------------------- ----2001-12-12 15 DEFAULT 2001-12-12 15 KEEP 2001-12-12 15 RECYCLE 2001-12-12 16 DEFAULT 2001-12-12 16 KEEP 2001-12-12 16 RECYCLE
This script provides us with the DBHR for each of the buffer pools at one-hour intervals. It is important that the KEEP pool always has a 99 to 100 percent DBHR. If this is not the case, data blocks should be added to the KEEP pool to make it the same size as the sum of all object data blocks that are assigned to the KEEP pool. To summarize, the DBA can control the DBHR by adding blocks within the Oracle parameters. Oracle recommends that the DBHR not fall below 90 percent.
97
mo dy Hr.'
select to_char(snap_time,'yyyy-mm-dd HH24') mydate, new.name buffer_pool_name, (((new.consistent_gets-old.consistent_gets)+ (new.db_block_gets-old.db_block_gets))(new.physical_reads-old.physical_reads)) / ((new.consistent_gets-old.consistent_gets)+ (new.db_block_gets-old.db_block_gets)) bhr from perfstat.stats$buffer_pool_statistics old, perfstat.stats$buffer_pool_statistics new, perfstat.stats$snapshot sn where (((new.consistent_gets-old.consistent_gets)+ (new.db_block_gets-old.db_block_gets))(new.physical_reads-old.physical_reads)) / ((new.consistent_gets-old.consistent_gets)+ (new.db_block_gets-old.db_block_gets)) < .90 and new.name = old.name and new.snap_id = sn.snap_id and old.snap_id = sn.snap_id-1 ;
98
Note: Only packages can be pinned. Stored procedures cannot be pinned unless they are placed into a package. The choice of whether to pin a package in memory is a function of the size of the object and the frequency of its use. Large packages that are called frequently might benet from pinning, but any difference might go unnoticed because the frequent calls to the procedure have kept it loaded into memory anyway. Therefore, because the object never pages out in the rst place, pinning has no effect. Also, the way procedures are grouped into packages can have some in uence. Some Oracle DBAs identify high-impact procedures and group them into a single package, which is pinned in the library cache. In an ideal world, the shared_pool parameter of the init.ora should be large enough to accept every package, stored procedure, and trigger that can be used by the applications. However, reality dictates that the shared pool cannot grow indenitely and wise choices must be made in terms of which packages are pinned. Because of their frequent usage, Oracle recommends that the standard, dbms_standard, dbms_utility, dbms_describe, and dbms_output packages always be pinned in the shar ed pool. The following snippet demonstrates how a stor ed pr ocedur e c a l l e d sys.standard can be pinned:
Connect system/manager as sysdba; @/usr/oracle/rdbms/admin/dbmspool.sql EXECUTE dbms_shared_pool.keep('sys.standard');
A standard procedure can be written to pin all of the recommended Oracle packages into the shared pool. Here is the script:
EXECUTE dbms_shared_pool.keep('DBMS_ALERT'); EXECUTE dbms_shared_pool.keep('DBMS_DDL'); EXECUTE dbms_shared_pool.keep('DBMS_DESCRIBE'); EXECUTE dbms_shared_pool.keep('DBMS_LOCK'); EXECUTE dbms_shared_pool.keep('DBMS_OUTPUT'); EXECUTE dbms_shared_pool.keep('DBMS_PIPE'); EXECUTE dbms_shared_pool.keep('DBMS_SESSION'); EXECUTE dbms_shared_pool.keep('DBMS_SHARED_POOL'); EXECUTE dbms_shared_pool.keep('DBMS_STANDARD'); EXECUTE dbms_shared_pool.keep('DBMS_UTILITY'); EXECUTE dbms_shared_pool.keep('STANDARD');
99
The DBA also needs to remember to run the pin.sql script whenever restarting a database. This is done by reissuing the PIN command from startup trigger immediately after the database has been restarted. Listing 4.7 shows a handy script to look at pinned packages in the SGA. The output from this listing should show those packages that are frequently used by your application. This is an easy way to tell the number of times a nonpinned stor ed procedure is swapped out of memory and required a reload. To effectively measure memory, two methods are recommended. The rst method is to regularly run the estat-bstat utility (usually located in ~/rdbms/admin/utlbstat.sql and utlestat.sql) for measuring SGA consumption over a range of time. The second method is to write a snapdump utility to interrogate the SGA and note any exceptional information relating to the library cache. This would include the following measurements: Data dictionary hit ratio Library cache miss ratio Individual hit ratios for all namespaces Also, be aware that the relevant parameter shared_pool_size is used for other objects besides stored procedures. This means that one parameter ts all. Oracle offers no method for isolating the amount of storage allocated to any subset of the shared pool.
100
101
102
Last program this provides the name of last program that the user was executing at the time of system logoff. Last action this provides the last action performed by the user during the session. Last module this provides the name of the last module accessed by the user prior to logoff time. Logoff date this is an Oracle date data type corresponding to the actual user logoff time, accurate to 1/1000 of a second. Now we know the information available to us at logon and logof f, but how do we collect this information and make it accessible to management? Lets take a look at the available options.
103
104
3. Host this uses Oracles SYS context function to capture the name of the host from which the Oracle session originated. Please note that capturing the host name is vital for systems using Oracle parallel server or real application clusters (RACs), because we can have many sessions connecting from many different instance hosts. 4. Logon date this captures the date of the actual work logon, accurate to 1/1000 of a second. Notice how we partitioned logon date into two separate elds. Having a separate eld for logon day and logon time produces a reader friendly report. Now that the logon trigger is in place, we have the challenge of creating a logoff trigger to capture all of the information required to complete the elapsed time for the user session.
105
106
Well take a look at a few sample reports that can be produced by the system. These reports can be enhanced to t specic needs. It is now obvious why the precomputing of elapsed minutes is such a valuable feature. It produces a more useful report.
107
user IDs can be correlated directly to screen functions, then the Oracle administrator can get a good idea of the amount of usage within each functional area of the Oracle applications (Listing 4.11). Now lets examine yet another type of report.
108
109
As we can see, this produces a clear graph showing user activity by the hour of the day. Once you get a large amount of user activity in your system, you can also summarize this information by the day of the week or the hour of the day. This provides a tremendous amount of information regarding the user signature for the system. By signature, we mean trend lines or spikes in user activity. For example, we might see high user activity every Wednesday afternoon at 1:00 PM. Using this user audit table, we can quickly identify these user signatures and adjust Oracle to accommodate these changes and end-user usage. Related DDL, system errors, and user activity can easily be captured using the system-level triggers. However, it is clear that system-level triggers are not as sophisticated as they might be, and Oracle indicates that efforts are underway to enhance system-level trigger functionality with the introduction of Oracle10i. However, the judicious use of the system logon and system logof f triggers can provide an easy and reliable tracking mechanism for Oracle user activity. For the Oracle administrator who is committed to tracking user activity over long-term periods, the user audit table can provide a wealth of interesting user information, including user usage signatures, aggregated both by the hour of the day and the day of the week.
110
Cost of downtime $
Hardware-level fallover
log could be added to the standby database and the database could be started in just a few minutes. Note: Oracle standby servers need to be fully licensed if they are hot standby servers that can be used for queries. Cold standby servers used less than 10 days per year do not need to be licensed.
CONCLUSION
The point of this chapter is that the SGA must be properly designed and that the SGA can change dramatically as the system is implemented. Hence, you must constantly monitor and change the Oracle instance to optimize it according to your current demands. Next, lets take a look at the proper design for tablespaces and review the various Oracle physical design options.
5
ORACLE TABLESPACE DESIGN
INTRODUCTION
Unlike the CODASYL databases of the 1980s, todays Oracle databases allow for tables to grow according to specied rules and procedures. In the Oracle model, one or more tables may reside in a tablespace. A tablespace is a predened container for the tables that map to xed les of a nite size. Tables that are assigned to the tablespace may gr ow according to the growth rules that are specied, but the size of the tablespace supersedes the expansion rules. In other words, a table may have more extents available according to the table denition, but there may not be room in the tablespace to allocate those extents. Over the past few years, Oracle has gradually recognized the benets of bitmap data structures. As Oracle has evolved, weve seen the following progressive introduction of bitmaps into the database engine: Oracle7 bitmap indexes Oracle8 LMTs Oracle8i bitmap freelists ASSM Oracle Database 10g ASM Its important to note that these are optional structures. Bitmap ASSM in Oracle9i is optional and can only be implemented at the tablespace level. Existing systems can continue to use the traditional method of free list management. In Oracle Database 10g, LMTs have become the default. This chapter will begin with a discussion of Oracle data blocks and then review the segment storage parameters. Well then explore the Oracle tablespace option and understand how pr oper physical design can improve performance and manageability.
111
112
113
Now that we see the block sizing issues, lets examine the storage options within Oracle tablespaces.
In Oracle9i, we expect an error if we try to specify PCTFREE or PCTUSED for a table dened inside a tablespace with ASM:
SQL> create table 2 3 4 tablespace test_table (c1 number)
114
5 6 7 8
However, here we see an important point. While Oracle9i rejects the PCTFREE and PCTUSED parameter with LMTs with automatic space management, it does allow you to enter invalid settings for NEXT and FREELISTS settings:
SQL> create table 2 3 4 5 6 7 Table created. storage ( freelists 30 next 5m ) ; tablespace asm_test test_table (c1 number)
This could be a serious issue for the Oracle professional unless they remember that LMTs with automatic space management ignore any specied values for NEXT and FREELISTS. Before we explore the details of designing with each of these options, its important to understand the segment storage options and see how they relate to the tablespace options. Lets start with a r eview of the segment storage parameters.
115
other blocks. The purpose of PCTFREE is to tell Oracle when to remove a block from the objects free list. Since the Oracle default is PCTFREE=10, blocks remain on the free list while they are less than 90 percent full. Once an insert makes the block grow beyond 90 percent full, it is removed from the free list, leaving 10 percent of the block for row expansion. Furthermore, the data block will remain off the free list even after the space drops below 90 percent. Only after subsequent deletes cause the space to fall below the PCTUSED threshold of 40 percent will Oracle put the block back onto the free list. PCTUSED this storage parameter determines when a block can relink onto the table free list after DELETE operations. Setting a low value for PCTUSED will result in high performance. A higher value of PCTFREE will result in efcient space reuse but will slow performance. As rows are deleted from a table, the database blocks become eligible to accept new rows. This happens when the amount of space in a database block falls below PCTUSED and a free list relink operation is triggered. For example, with PCTUSED=60, all database blocks that have less than 60 percent will be on the free list, as well as other blocks that dropped below PCTUSED and have not yet grown to PCTFREE. Once a block deletes a row and becomes less than 60 percent full, the block goes back on the free list. As rows are deleted, data blocks become available when a blocks free space drops below the value of PCTUSED for the table and Oracle relinks the data block onto the free list chain. As the table has rows inserted into it, it will grow until the space on the block exceeds the threshold PCTFREE, at which time the block is unlinked from the free list. FREELISTS Oracle allows tables and indexes to be dened with multiple free lists. All tables and index free lists should be set to the HWM of concurrent INSERT or UPDATE activity. Too low a value for free lists will cause poor Oracle performance. There is a direct trade-off between the setting for PCTUSED and efcient use of storage within the Oracle database. For databases where space is tight and storage within the Oracle data les must be reused immediately, the Oracle DBA will commonly set PCTUSED to a high value. This ensures the blocks go on the free list before they are completely empty. However, the downside to this approach is that every time the data block lls, Oracle must unlink the data block from the free list and incur another I/O to get another free data block to insert new rows. In sum, the DBA must strike a balance between ef cient space usage and the amount of I/O in the Oracle database.
116
Lets begin our discussion by introducing the relationship between object storage parameters and per formance. Poor object performance within Oracle occurs in several areas: Slow INSERTs INSERT operations run slowly and have excessive I/O. This happens when blocks on the free list have room for only a few rows before Oracle is forced to grab another free block. Slow SELECTs SELECT statements have excessive I/O because of chained rows. This occurs when rows chain and fragment onto several data blocks, causing additional I/O to fetch the blocks. Slow UPDATEs UPDATE statements run slowly with double the amount of I/O. This happens when updates expand a VARCHAR or BLOB column and Oracle is forced to chain the row contents onto additional data blocks. Slow DELETEs large DELETE statements run slowly and cause segment header contention. This happens when rows are deleted and the database must relink the data block onto the free list for the table. As you can see, the storage parameters for Oracle tables and indexes can have an important effect on the performance of the database. Lets take a look at the common storage parameters that affect Oracle performance.
117
There is a trade-off between the setting for PCTUSED and database performance on INSERT operations. In general, the higher the setting for PCTUSED, the less free space will be on reused data blocks at INSERT time. Hence, INSERT tasks will need to do more frequent I/Os than they would if they were inserting into empty blocks. In short, the value for PCTUSED should be set above 40 only when the database is short on disk space and it must make efcient reuse of data block space. It should now be clear that the average r ow length needs to be considered when customizing the values for PCTFREE and PCTUSED. You want to set PCTFREE such that room is left on each block for row expansion. You want to set PCTUSED so that newly linked blocks have enough room to accept rows. Herein lies the trade-off between effective space usage and performance. If you set PCTUSED to a high value, say 80, then a block will quickly become available to accept new rows, but it will not have room for a lot of rows before it becomes logically full again. In the most extreme case, a relinked free block may have only enough space for single rows before causing another I/O operation. Remember that the lower the value for PCTUSED, the less I/O your system will have at INSERT time and the faster your system will run. The downside, of course, is that a block will be nearly empty before it becomes eligible to accept new rows. Because row length is a major factor in intelligently setting PCTUSED, a script can be written that allows the DBA to specically control how many rows will t onto a reused data block before it unlinks from the free list. Note that this script provides only general guidelines; you will want to leave the default PCTUSED=40 unless your system is low on disk space or unless the average row length is large. Now lets take a close look at free lists and see how a free list shortage can cause performance slowdowns.
118
simultaneously insert information into an Oracle table, tasks will have to wait their turn to get access to the segment header. In sum, any time buffer busy waits occur, the Oracle DBA must try to nd those tables or indexes that are experiencing the segment header contention and increase the free lists or freelist_groups parameters. The freelist_groups parameter allows an Oracle table to have several segment headers, so that multiple tasks can insert into the table. The setting for the FREELISTS parameter should be set equal to the HWM of the number of concurrent inserts for the target table.
119
INSERTs, you can turn off free list link/unlinks. It takes fewer resources for Oracle to extend a table than to manage free lists. In effect, free lists can be turned off by setting PCTUSED to 1. This will cause the free lists to be populated exclusively from new extents. This approach requires lots of extra disk space and the table must be reorganized periodically to reclaim space within the table. Lets review the general guidelines for setting object storage parameters: Always set PCTUSED to allow enough room to accept a new row. We never want to have free blocks that do not have enough room to accept a row. If we do, this will cause a slowdown since Oracle will attempt to read ve dead free blocks before extending the table to get an empty block. The presence of chained rows in a table means that PCTFREE is too low or that DB_BLOCK_SIZE is too small. In most cases within Oracle, RAW and LONG RAW columns make huge rows that exceed the maximum blocksize for Oracle, making chained rows unavoidable. If a table has simultaneous INSERT SQL processes, it needs to have simultaneous DELETE processes. Running a single purge job will place all of the free blocks on only one free list and none of the other free lists will contain any free blocks from the purge. The FREELISTS parameter should be set to the HWM of updates to a table. For example, if the customer table has up to 20 end users performing INSERTs at any time, then the customer table should have FREELISTS=20. FREELIST GROUPS should be set to the number of Oracle Parallel Server instances that access the table. For partitioned objects and cases of segment header contention, freelist_groups may be set for non-RAC systems. The PCTFREE parameter is used to reserve space on each data block for the future expansion of row values (via the SQL UPDATE command). Table columns may be dened as allowing null values that do not consume any space within the row or with VARCHAR data types. A VARCHAR data type species the maximum allowable length for the column instance, but the acceptable range of values may be anywhere from 4 B (the size of the length holder) to the size of the eld plus 4 B. Hence, a VARCHAR(2000) may range in size from 4 B to 2004 B. If an application initially stores rows with empty values and later lls in the values, the PCTFREE parameter can dramatically reduce I/O contention. If a block of storage is lled by the addition of a row, subsequent
120
updates to that row to ll in column values will cause the row to fragment usually onto the next available contiguous block. Next, lets cover the main tablespace types within Oracle and show you how an up-front tablespace design decision can make a huge difference after your system is implemented.
We can cr eate a table in this tablespace with an unexpanded VARCHAR2(2000) data type by entering the following commands. Later, well expand the rows and see if there is fragmentation.
create table test_frag
121
We now have a table named test_frag in a 2 K tablespace. The next step is to populate 4000 rows, with only a single specication in the VARCHAR2 column:
declare myint integer := 1; begin loop insert into test_frag values ( test_frag_seq.nextval, ' ' ); myint := myint+1; if myint > 4000 then exit; end if; end loop; end; /
Now that we have the rows inserted, lets take a look at how many rows are stored on the data block in DBA_TABLES:
Table % Free NUM_ROWS AVG_ROW_LEN CHAIN_CNT -------------------- ---------- ---------- ----------- ---------TEST_FRAG 10 4000 9 0
In DBA_SEGMENTS, we see that the table is in a single extent. We also see that we used 32 data block at 2 K per block to store 4000 rows. This works out to 500 data rows per block.
122
Now lets make a mess and expand a large VARCHAR2 column from 1 B to 2000 B. After the update, we see in DBA_SEGMENTS that the table is much larger:
Table name -------------TEST_FRAG Tablespace Name -----------ASM_TEST Buffer Pool Bytes Blocks Extents ---------- --------- -------- ------DEFAULT 9,437,184 4,608 24
Now our table is on 4608 blocks and the table has taken 24 extents. When we examine DBA_TABLES, we see that the table now has an average row length of 1378 and every single row has chained.
Table % Free NUM_ROWS AVG_ROW_LEN CHAIN_CNT -------------------- ---------- ---------- ----------- ---------TEST_FRAG 10 4000 1378 4000
Row chaining is a serious problem for the DBA and it appears that automatic space management is not appropriate for tables where you need to reserve space for large row expansions with PCTFREE.
123
the table rows (dba_tables.avg_row_len), it should be able to adjust PCTUSED to ensure that the relinked data block will have room for new rows. One benet of automatic segment management is that the bitmap free lists are guaranteed to reduce buffer busy waits. Lets take a close look at this feature. Prior to Oracle9i, buffer busy waits were a major issue. A buffer busy wait occurs when a data block is inside the data buf fer cache, but its unavailable because another SQL INSERT statement needed to get a block on which to place its row. Without multiple free lists, every Oracle table and index had a single data block at the head of the table to manage free blocks for the object. Whenever any SQL INSERT ran, it had to go to this block and get a data block on which to place its row. Obviously, single free lists cause a backup. When multiple tasks want to insert into the same table, they are forced to wait while Oracle assigns free blocks, one at a time. Oracles ASSM feature claims to improve the performance of concurrent DML operations signicantly, because different parts of the bitmap can be used simultaneously, eliminating serialization for free space lookups.
124
rebuild_freelists procedure is to coalesce bitmap free list blocks onto the master free list and zero out all other free lists for the segment. For tables and indexes accessed by RAC (using multiple free list groups), Oracle9i will evenly distribute all free blocks among the existing free list groups. This is an important feature for table and indexes with multiple free lists because the DBA no longer has to reorganize a table to rebalance the bitmap free lists. Heres an example of using this procedure to rebuild the free lists for the BOOK table:
dbms_repair.rebuild_freelists('PUBS','BOOK');
Once a table or index is allocated in this tablespace, the values for PCTUSED will be ignored and Oracle9i will automatically manage the free lists for the tables and indexes inside the tablespace. For objects created in this tablespace, the NEXT extent clause is now obsolete because of the LMT. The INITIAL parameter is still required because Oracle cant know in advance the size of the initial table load. When using automatic space management, the minimum value for INITIAL is three blocks. Theres some debate about whether a one-size-ts-all approach is best for Oracle. In large databases, individual object settings can make a huge difference in performance and storage. ASSM is a simpler and mor e
125
efcient way of managing space within a segment. It completely eliminates any need to specify and tune the PCTUSED, FREELISTS, and FREELIST GROUPS storage parameters for schema objects created in the tablespace. If any of these attributes are specied, they are ignored. When you create a LMT using the CREATE TABLESPACE statement, the SEGMENT SPACE MANAGEMENT clause lets you specify how free and used space within a segment is to be managed. For example, the following statement creates tablespace mytbs1 with ASSM:
CREATE TABLESPACE mytbs1 DATAFILE '/u01/oracle/data/mytbs01.dbf' SIZE 500M EXTENT MANAGEMENT LOCAL SEGMENT SPACE MANAGEMENT AUTO;
When an object such as a table or index is cr eated using the LMT, with ASSM enabled, ther e is no need to specify the PCTFREE or FREELISTS. The in-segment free/used space is tracked using bitmaps as opposed to the free lists. When you cannot use the LMT and therefore the automatic management space feature, you have to depend on the traditional method of managing free lists and free list groups. ASSM offers the following benets: It provides administrative ease of use by avoiding the specication of storage parameters. It is a good method for handling objects with varying row sizes. It provides better runtime adjustment for variations in concurrent access and avoids tedious tuning methods. It provides better multi-instance behavior in terms of performance and space utilization. However, note that ASSM is available only with LMTs and their objects. A new column called SEGMENT_SPACE_MANAGEMENT has been added to the dba_tablespaces view to indicate the segment space management mode used by a tablespace. Use the Oracle procedure dbms_space.space_usage to provide the space usage ratio within each block in the bitmap managed block (BMB) segments. It provides information regarding the number of blocks in a segment with the following ranges of free space: 0 to 25 percent free space within a block 25 to 50 percent free space within a block
126
50 to 75 percent free space within a block 75 to 100 percent free space within a block One huge benet of automatic segment management is the bitmap FREELISTS that are guaranteed to reduce buffer busy waits. Lets take a close look at this feature. Prior to Oracle9i, buffer busy waits were a major issue. As a review, a buffer busy wait occurs when a data block is inside the data buf fer cache, but it is unavailable because it is locked by another DML transaction. A block was unavailable because another SQL INSERT statement needed to get a block on which to place its row. Without multiple FREELISTS, every Oracle table and index had a single data block at the head of the table to manage the free block for the object. Whenever any SQL INSERT ran, it had to go to this block and get a data block on which to place its row. Oracles ASSM feature claims to improve the performance of concurrent DML operations signicantly since different parts of the bitmap can be used, simultaneously eliminating serialization for free space lookups. According to Oracle benchmarks, using bitmap FREELISTS removes all segment header contention and allows for super -fast concurrent INSERT operations (Figure 5.1).
18000 16000 Rows Inserted per Second 14000 12000 10000 8000 6000 4000 2000 0 Auto Manual
Figure 5.1 Oracle Corporation Benchmark on SQL INSERT Speed with Bitmap FREELISTS
127
Along with the automatic segment management features, we get some new tools for the DBA. Lets take a look at how the Oracle9i DBA will use these tools.
128
rows of highly varying size. Table 5.1 lists the values inside the four-bit space. The value of this bitmap indicates how much free space exists in a given data block. In traditional space management, each data block must be read from the FREELIST to see if it has enough space to accept a new row. In Oracle9i, the bitmap is constantly kept up- to-date with changes to the block, which reduces wasted space because blocks can be kept fuller because the overhead of FREELIST processing has been reduced. Another enhancement of Oracle9i space management is that concurrent DML operations improve signicantly. This is because different parts of the bitmap can be used simultaneously, thereby eliminating the need to serialize free space lookups. Please note that Oracle9i segment control structures are much larger than traditional FREELIST management. Because each data block entry contains the four-byte data block address and the four-bit free space indicator, each data block entry in the space management bitmap will consume approximately six bytes of storage. It is also important to note that space management blocks ar e not required to be the rst blocks in the segment. In Oracle8, the segment headers were required to be the rst blocks in the segment. In Oracle8i, this restriction was lifted and the DBA could allocate additional FREELISTS with the ALTER TABLE command. In Oracle9i, Oracle automatically allocates new space management blocks when a new extent is created and maintains internal pointers to the bitmap blocks (refer to Figure 5.2). Just like traditional FREELISTS, the BMB is stored in a separate data block within the table or index. Because Oracle does not publish the internals of space management, we must infer the structure from block dumps. Hence, this information may not be completely accurate, but it will give us a general idea about the inter nal mechanisms of Oracle9i automatic space management.
129
Data Blocks
Bitmap Blocks
Unlike a linear-linked list in traditional FREELISTS, bitmap blocks are stored in a B-tree structure, much like a B-tree index structure. This new structure has important ramications for concurrent DML. In traditional FREELISTS, free blocks must be accessed one at a time and this causes segment header contention in applications with high- volume INSERT operations. Because Oracle9i can use the FREELISTS blocks much like a B-tree index, multiple transactions can simultaneously access free blocks without locking or concurrency problems. As we have noted, the purpose of bitmap blocks is to track the free blocks in the segment. Since the free blocks are organized in a B-tree, we see the information inside the segment control block. Three data blocks comprise the segment header. The extent control header block contains the following components: The extent map of the segment The last block at each level of the B-tree The low HWM (LHWM) The high HWM (HHWM) The HWM in the segment header has also changed in Oracle9i bitmap blocks. Instead of having a single pointer to the highest free block in an object, the B-tree index structure allows for a range of HWM blocks. Hence, we see two pointers for the HWM: 1. LHWM all blocks below this block have been formatted for the table. 2. HHWM all blocks above this block have not been for matted. Internally, the HHWM is required to ensure that Oracle direct load operations can access contiguous unformatted blocks. Lets look at each block in detail to understand how space is managed in bitmap segment control. The Extent Control Header Block contains the HHWM, the LHWM, the extent map, and the data block addr esses for each of the three levels of bitmap blocks.
130
Extent 2 Bitmap Block Block 0 1 2 3 4 5 6 7 Freespace 0001 0001 0001 0001 0001 0001 0001 0001
Extent 1 Bitmap Block Block 0 1 2 3 4 5 6 7 Freespace 0001 0001 0001 0010 0010 0010 0010 0010
Extent 0 Bitmap Block Block 0 1 2 3 4 5 6 7 Freespace 0100 0100 0100 0100 0101 0101 0101 0101
Figure 5.3 Segment Header Extent Map Points to All Extent Bitmaps in Segments
The extent map lists all of the data block addr esses for each block within each extent within the segment and shows the four-bit free space of each block within the extent. Since the extent size is contr olled by Oracle9i LMTs, each extent size within the tablespace is uniform, regardless of the NEXT extent size for each object in the tablespace. Note that the rst three blocks of the rst extend list (blocks 02) are used for metadata and are not available for segment block addresses. For each extent in the segment, Oracle9i keeps an entry pointing to the bitmap for that segment (Figure 5.3). Oracle9i also has pointers to the last bitmap block within each logical bitmap level (Figure 5.4). This new pointer structure allows Oracle9i to quickly access multiple bitmaps to improve concurrency of high-volume INSERTs.
131
Data Block Address Data Block Address Four-Bit Space Indicator 0001 0001 0001 Data Four-Bit Block Space Address Indicator 0101 0101 Last BMB for Level 1 0101 0101 0101 0101 0101 Last BMB for Level 2 Extent Control Header Block 0001 0001
Four-Bit Space Indicator 0101 0101 0101 0101 0101 0101 0101
0101 0101 0111 0101 0011 0001 0001 Last BMB for Level 3
Reducing buffer busy waits ASSM will remove buffer busy waits better than using multiple FREELISTS. As we may know, when a table has multiple FREELISTS, all purges must be parallelized to reload the FREELISTS evenly. Whereas, ASSM has no such limitation. Great for RAC the bitmap FREELISTS remove the need to dene multiple FREELISTS groups for RAC and provide overall improved FREELIST management over traditional FREELISTS. Cons of ASSM: Slow for FTSs several studies have shown that large-table FTSs will run longer with ASSM than standard bitmaps. ASSM- FTS tablespaces are consistently slower than Freelist-FTS operations. This implies that ASSM may not be appropriate for DSSs and warehouse applications, unless partitioning is used with Oracle Parallel Query.
132
Slower for high-volume concurrent INSERTS numerous experts have conducted studies that show that tables with high volume bulk loads per form faster with traditional multiple FREELISTS. ASSM will inuence index clustering for row-ordered tables, ASSM can adversely af fect the clustering_factor for indexes. Bitmap FREELISTS are less likely to place adjacent rows on physically adjacent data blocks. This can lower the clustering_factor and the cost-based optimizers propensity to favor an index range scan. It remains to be seen how many experienced DBAs will start using automatic space management and how many will continue to use the older method. Although automatic space management pr omises faster throughput for multiple DML statements, Oracle professionals must always be on the watch for chained rows caused by a generic setting for PCTFREE. The seasoned DBA may want to bypass these new features in order to control the behavior of the table rows inside the data blocks. Now lets examine how to design for Oracle replication.
REPLICATION DESIGN
Managing an Oracle data warehouse becomes challenging when we move into the distributed database environment. The challenge arises because so many components within the database software contribute to the overall performance. The number of concurrent users, the availability of space within the buffer and lock pools, and the balancing of access acr oss processors all can affect database performance. When a data warehouse accesses several remote databases in a single warehouse query, another dimension of complexity is added to the data warehouse. Not only must the DBA look at each individual database, but the DBA also must consider transactions that span several servers. Although accessing several servers in a distributed warehouse query may seem trivial, performance problems can be introduced by PC hardware, network bottlenecks, router overloads, and many other sources. Lets take a look at distributed data warehouses and examine how they differ from traditional data warehouse environments.
CONCLUSION
This chapter has been concerned with the physical design of the le and tablespace structures. The main points of this chapter include:
133
LMTs are now the default in Oracle Database 10g and should be used whenever feasible. ASSM relieves the one-way chains and relieves buffer busy waits on high-update segments. You still have the option of manually setting PCTFREE, PCTUSED, and FREELISTS for individual segments. Now we are ready to look at the physical design options for Oracle tables and indexes. The next chapter will explore all of the table options to illustrate the advantages of each and show how they interface with the logical database design.
6
ORACLE TABLE DESIGN
INTRODUCTION
The data storage structures within Oracle are plentiful and range from simple to complex. During your physical design, you have a choice between standard relational tables and one of the many table extensions offered within the Oracle software. This chapter will address the following table design issues: Design with replicated tables Design with external tables Design with materialized views Design with Oracle object tables Design with ADTs Design with Oracle OIDs Design with pointer-based tables Design with nested tables Design with Oracle member methods Lets begin with a review of the Oracle replicated table structure and see how it can be incorporated into a physical design.
136
One of the biggest mistakes a company can make is to implement advanced replication when all they need are read-only materialized views. With replication, more is harder to implement, harder to maintain, harder to troubleshoot, and takes more of your time. Here are some criteria that you can use to deter mine the level of replication that best ts your situation.
137
replication in a changing database will entail a signicant increase in the DBAs workload.
Does the Replicated Site Require the Ability to Replicate to Another Site?
A master site can replicate with other sites. If the remote site only replicates with one master site, use updatable materialized views. If the remote site must replicate the data further, then it too must be a master site and multimaster replication is required. As you might have gured, replication is difcult to understand and time-consuming to set up. But its daunting reputation is much worse than reality. Once you get it set up and operating, you will nd it really isnt very intimidating. Remember to replicate at the lowest level possible. Dont use advanced replication where basic replication will work. Dont try to replicate more objects than your server and network are able to support. Now lets look at Oracle9i external tables and see how they can be incorporated into your logical data model.
138
Operating System Files c:\oracle\web_log.txt UTL_FILE writes flat files c:\oracle\flat_file.csv External Tables reads flat files
Oracle Database
139
The le contains the following employee information: Employee ID Last name Job description Managers employee ID Hire date Salary Commission Department So, how do we dene this le to Oracle? First, we must create an Oracle directory entry in the data dictionary that points to the Windows directory where the at le resides. In this example, well name the directory testdir and point it to c:\docs\pubsdb\queries:
SQL> create directory testdir as c:\docs\pubsdb\queries; Directory Created.
Now that we have the directory, we can dene the structure of the external le to Oracle. Youll see this code in Listing 6.1. In this syntax, we dene the column of the external table in much the same way as you would an internal Oracle table. The external denitions occur in the organization external clause, as shown in Table 6.1. Now that weve dened the external table, we can run reports against the external table using SQL, just as if the table resided inside the database. In the query shown in Listing 6.2, note the use of the sophisticated ROLLUP parameter to summarize salaries by both department and job title. The results are available in Listing 6.3. Because external tables are new, Oracle has not yet per fected their use. In Oracle9i, the feature has several limitations, including: No support for DML external tables are read-only, but the base data can be edited in any text editor. Poor response for high-volume queries external tables have a processing overhead and are not suitable for large tables. Accessing at les via Oracle has a number of uses. For example, you can dene spreadsheets to Oracle. This technique has important
140
hiredate date,
ramications for shops where users can control systemwide parameters inside desktop spr eadsheets and Oracle knows immediately about changes. The advent of external tables in Oracle9i is exciting because it allows SQL queries to access any type of at le, as if the data were stored inside an Oracle table. Well examine some caveats to this new appr oach, specically:
141
142
The external le must be comma-delimited and stored on the server as a le with a .csv extension. External spreadsheets are not good for large les because the entire le must be reread into Oracle whenever a change is saved to the spreadsheet. End users must never reformat the data columns inside the spreadsheet environment. The next code listing shows the syntax used to make the le appear as an Oracle external table. Note the csv le name sufx.
create directory testdir as u01/oracle/oradata/testdb; create table emp_ext ( EMPNO ENAME JOB MGR SAL COMM DEPTNO NUMBER(4), VARCHAR2(10), VARCHAR2(9), NUMBER(4), NUMBER(7,2), NUMBER(7,2), NUMBER(2))
HIREDATE DATE,
Organization external (type oracle_loader default directory testdir access parameters (records delimited by newline fields terminated by ,) location (emp_ext.csv)) reject limit 1000;
However, when dening the at le as an external table, the le remains on the OS as a at le, where it can be read and updated with a variety of tools, including spreadsheets. Using Excel spreadsheets, the external table data can be read just as if it were standard spreadsheet data (Figure 6.2). End users can now manage critical tables inside easy-to-use spreadsheets. Oracle immediately notices whenever a change is made to the spreadsheet. However, there are important limitations to using spreadsheets as Oracle tables, the foremost being excessive disk I/O whenever the spreadsheet has changed. Lets take a closer look.
143
144
sum(buf_count) buffer_blocks
Because Oracle reads OS les in data blocks, we can compute the amount of disk I/O by determining the number of spreadsheet blocks with a simple shell script. In this script, we know the Oracle database has 8 KB block sizes:
bytes=`ls -al|grep emp_ext.csv|awk '{ print $5 }'` num_bytes=`expr $bytes` blocks=`expr $num_bytes / 8192` echo $blocks
This script will tell us exactly how many disk r eads are required to access the Oracle external table whenever a change is made.
145
In UNIX, we can use this command to make the spreadsheet read-only for everyone except the owner of the spreadsheet:
chmod 744 emp_ext.csv
This ensures the le will not be updated, except by authorized users. It makes sure that Oracle caches the data in an efcient manner. Once dened to Oracle, the spreadsheet will be accessible both through Oracle and the Excel spreadsheet.
146
Once the le has been saved, Oracle can no longer read the SALARY column because the column has been stored in quotes. To Oracle, this denes the column as a character:
7369,SMITH,CLERK,7902,17-Dec-80,800,20, 7499,ALLEN,SALESMAN,7698,20-Feb-81,"1,600",300,30 7521,WARD,SALESMAN,7698,22-Feb-81,"1,250",500,30 7566,JONES,MANAGER,7839,2-Apr-81,"2,975",,20 7654,MARTIN,SALESMAN,7698,28-Sep-81,"1,250",1400,30 7698,BLAKE,MANAGER,7839,1-May-81,"2,850",,30 7782,CLARK,MANAGER,7839,9-Jun-81,"2,450",,10 7788,SCOTT,ANALYST,7566,19-Apr-87,"3,000",,20 7839,KING,PRESIDENT,,17-Nov-81,"5,000",,10 7844,TURNER,SALESMAN,7698,8-Sep-81,"1,500",0,30 7876,ADAMS,CLERK,7788,23-May-87,"1,100",,20
The accidental reformatting of the le makes it unreadable by Oracle. You must take special care to instruct end users to never change the formatting. In summary, external tables are a great way to incorporate volatile data into the database without undergoing the task of physically loading the Oracle tables.
147
Next, lets examine how design with materialized views can allow multiple data structures (1NF and 3NF) to exist within the same Oracle database.
148
ten minutes, every day, and so on, depending on the volatility of the data. Here is an example:
CREATE MATERIALIZED VIEW emp_sum ENABLE QUERY REWRITE REFRESH FAST START WITH SYSDATE NEXT AS SELECT deptno, job, SUM(sal) FROM emp GROUP BY deptno, job ; Materialized View Created. SYSDATE + 1/24
In the above example, the materialized view is recreated (refreshed) every 1/24 of a day (once per hour). This refresh interval gives the database developer complete contr ol over the refresh interval for the materialized views and allows them to take long-running expensive SQL queries and make them run super fast.
149
Several system privileges must be granted to anyone using materialized views. These grant statements can often be grouped into a single role and the role granted to the end user:
grant query rewrite to nelson; grant create materialized view to nelson; alter session set query_rewrite_enabled = true;
150
151
152
153
154
Robust Oracle9i partitioning allows for multilevel keys, a combination of the range and list partitioning technique. The table is rst range partitioned and then each individual range partition is further subpartitioned using a list partitioning technique. Unlike composite range-hash partitioning, the content of each subpartition represents a logical subset of the data, described by its appropriate range and list partition setup. Faster backups a DBA can back up a single partition of a table, rather than backing up the entire table, thereby reducing backup time. Less overhead because older partitioned tablespaces can be marked as read-only, Oracle has less stress on the redo logs, locks, and latches, thereby improving overall performance. Easier management maintenance of partitioned tables is improved because maintenance can be focused on particular portions of tables. For maintenance operations across an entire database object, it is possible to perform these operations on a per-partition basis, thus dividing the maintenance process into more manageable chunks. Faster SQL Oracle is partition-aware and some SQL may improve speed by several orders of magnitude (over 100 times faster). Index range scans partitioning physically sequences rows in index order causing a dramatic improvement (over 10 times faster) in the speed of partition-key scans. FTSs partition pruning only accesses those data blocks required by the query. Table joins partitionwise joins take the specic subset of the query partitions, causing huge speed improvements on nested loop and hash joins. Updates Oracle Parallel Query for partitions improves batch access speed by isolating the affected areas of the query. In summary, partitioning has a fast payback time and the immediate improvements to performance and stress reduction on the Oracle server makes it a slam-dunk decision. Next, lets examine the Oracle object-oriented table structures and see how they interface with physical design.
155
OBJECTS
DATABASE ENTITIES
Report Card
Course Course_Number
Enrolls
Takes
Class Schedule
It is a challenge to many Oracle design professionals to know when to use these Oracle data model extensions. This section provides a brief review of advanced Oracle topics and how they ar e used to design high-performance Oracle databases.
156
upon the Oracle object and all processes that manipulate the object are encapsulated inside Oracles data dictionary. This functionality has huge benets for the development of all Oracle systems. Prior to the introduction of member methods, each Oracle developer was essentially a custom craftsman writing custom SQL to access Oracle information. By using member methods, all interfaces to the Oracle database are performed using pretested methods with known interfaces. This way, the Oracle developers role changes from custom craftsman to more of an assembly line coder. You simply choose from a list of prewritten member methods to access Oracle information.
Object Orientation and Oracle Oracle9i offers numerous choices for the introduction of object-oriented data model constructs into relational database design. Oracle9i offers the ability to dereference table row pointers, ADTs, and limited polymorphism and inheritance support. In Oracle9i, data model constructs used in C ++ or Smalltalk programming can be translated directly into an Oracle structure. In addition, Oracle supports abstract data typing whereby you create customized data types with the str ong typing inherent in any of the standard Oracle data types like NUMBER, CHAR, VARCHAR, and DATE. For example, here is an Oracle8 table created with ADTs and a nested table.
CREATE OR REPLACE TYPE employee AS OBJECT ( last_name full_address prior_employers ); create table emp of employee; varchar(40), full_mailing_address_type, prior_employer_name_arr
Next, we use extensions to standard Oracle SQL to update these ADTs as shown below:
insert into emp values ( 'Burleson', full_mailing_address_type('7474 Airplane Ave.','Rocky Ford','NC','27445'),
157
Now, we create the parent table with the nested table as we see below.
create table emp1 ( last_name current_address prev_address char(40), full_mailing_address_type, nested_address )
158
Customer table
pointercolumn
PL/SQL VARRAYs
Order table
A nested table appears as a part of the master table. Internally, it is a separate table. The store as clause allows the DBA to give the nested table a specic name (Figure 6.5). Note that the nested_prev_address subordinate table can be indexed just like any other Oracle table. Also, notice the use of the return as locator SQL syntax. In many cases, returning the entire nested table at query time can be time-consuming. The locator enables Oracle to use the pointer structures to dereference pointers to the location of the nested rows. A pointer dereference happens when you take a pointer to an object and ask the program to display the data the pointer is pointing to. In other words, if you have a pointer to a customer r ow, you can dereference the OID and see the data for that customer. The link to the nested tables uses an Oracle OID instead of a traditional foreign key value.
159
ADT tables creating user-dened data types simplies Oracle database design. Doing ADTs also provides uniform data denitions for common data items. There is no downside for SQL performance. The only downside for SQL syntax is the r equirement that all references to ADTs be fully qualied. Nested tables nested tables have the advantage of being indexed and the repeating groups are separated into another table so as not to degrade the performance of FTSs. Nested tables allow for an innite number of repeating groups. However, it sometimes takes longer to dereference the OID to access the nested table entries as opposed to ordinary SQL tables join operations. Most Oracle experts see no compelling benet of using nested tables over traditional table joins. VARRAY tables VARRAY tables have the benet of avoiding costly SQL joins and they can maintain the order of the VARRAY items based upon the sequence when they were stored. However, the longer row length of VARRAY tables causes FTSs to run longer and the items inside the VARRAY cannot be indexed. More importantly, VARRAYs cannot be used when the number of r epeating items is unknown or large. There is much confusion concerning the implementation of the database object model by Oracle9i. The design of robust object systems using Oracle9i object features is explored in the following sections. Oracle designers are also shown how to plan ahead with these features in mind.
160
It is noteworthy that ADTs were commonly used within prerelational databases. This ability was lost when the relational model was introduced. There were only a few allowable data types such as numeric and character in prerelational databases. However, these databases allowed for the component values to be grouped into larger units. These larger units could then be manipulated easily within the database. For example, if a full_address construct is copied into numerous record denitions, it can be handled as if it were a single unit. Prerelational databases such as IMS and IDMS supported ADTs, but strong typing features were not introduced until about 1990, with the rst commercial object-oriented databases. Actually, other DBMSs have offered many of Oracle9is new features for years. For example, Dr. Wong Kim developed UniSQL, a relational/object-oriented database that supports the concept of nested tables. In UniSQL, a data eld in a table can be a range of values or an entire table. In this way, the domain of values can be dened for a specic eld in a relational database. Relationship data can be incorporated directly into a table structure using nested data tables. Having considered the basic idea behind ADTs, we can investigate some of the compelling benets to this approach. ADTs are useful within an Oracle9i design for several reasons: Encapsulation ADTs ensure uniformity and consistency because each type exists as a complete unit. Each type includes the data denitions, default values, and value constraints. Moreover, an ADT can interact with other ADTs. Regardless of where it appears in a database, the same logical data type always has the same denition, default values, and value constraints. Reusability common data structures can be reused within many denitions, ensuring uniformity and saving coding time. Flexibility the database object designer is able to model the real world by creating real-world representations of data. If the data types are properly analyzed and incorporated into the database object model, abstract data typing is a powerful tool. Next, well consider implementing abstract data typing in Oracle9i. The requirement to model all types at their lowest level was one of the shortcomings of Oracle7. For example, the address information of a customer could be accessed only by manipulating street_address, city_address, and zip_code as three separate statements. Oracle9i makes it possible to create an ADT called full_address and manipulate it as if it were a single data type. This is a huge improvement for Oracle, but as already mentioned, prerelational databases supported this construct.
161
A data type composed of subtypes has always been available to COBOL users. For example, a full address can be dened in COBOL as follows:
05 CUSTOMER-ADDRESS. 07 STREET-ADDRESS 07 CITY-ADDRESS 07 ZIP-CODE PIC X(80). PIC X(80). PIC X(5).
The CUSTOMER-ADDRESS can then be treated as if it were a single entity, like this:
MOVE CUSTOMER-ADDRESS TO PRINT-REC. MOVE SPACES TO CUSTOMER-ADDRESS.
The data type denition contains much more than just the data and the data size. Default values can also be assigned to the data types and value constraints specied. When the object is created, default values are assigned and the constraint checks occur. In this way, the database designer is assured complete control over the data denition and the values inserted into the data type. Customer_address can then be manipulated as a valid data type and it can be used to create tables and select data as follows:
CREATE TABLE CUSTOMER ( full_name full_address . . . ); cust_name, customer_address,
Once an Oracle table is dened, you can reference full_address in your SQL, just as if it were a single data type:
162
SELECT DISTINCT full_address FROM CUSTOMER; INSERT INTO CUSTOMER VALUES ( full_name ('ANDREW','S.','BURLESON), full_address('123 1st st.','Minot, ND','74635'); UPDATE CUSTOMER (full_address) VALUES ' ';
It is important to note that the Oracle SQL SELECT statements change when rows are accessed that contain ADTs. Here is the required Oracle SQL syntax to select a component within the full_address type:
SELECT full_address.zip_code FROM CUSTOMER WHERE full_address.zip_code LIKE '144%';
By taking the concept of ADTs one step further, we can see how ADTs can be nested within other data types.
Nesting ADTs
ADTs were introduced primarily to give the database designer the ability to reuse components consistently across the entire database domain. DBAs need to be comfortable nesting ADTs within other ADTs, because data types are often created to include other data types. For example, a data type could be created that includes the data in an existing table. That data type could then be used to dene a new table, as follows:
CREATE TYPE customer_stuff AS OBJECT ( customer_name, customer_address customer_address); full_name home_address business_address
After the customer_stuff type is dened, the table denition becomes simple:
CREATE TABLE CUSTOMER (customer_data customer_stuff);
163
Nesting ADTs in this way allows the duplication of the object-oriented concept of encapsulation. In other words, groups of related data types are placed into a container that is completely self-contained while retaining the full capability of the innate relational data types, such as INT and CHAR. Nested ADTs are displayed in the same fashion as the sample queries seen earlier, except that the target data types require several dots to delimit the levels of nesting within the data structure. For example, the query below displays the street_address for a customer:
SELECT customer_stuff.customer_name.zip_code FROM CUSTOMER WHERE customer_stuff.customer_name.zip_code like 144%;
The above r eference to zip_code must be pr e c e d e d w i t h customer_name since it is nested in this data type. In tur n, the customer_name data type is nested within the customer_stuff data type. The proper SQL reference to zip_code is therefore expressed as follows:
customer_stuff.customer_name.zip_code.
With this general understanding of how ADTs function and operate within Oracle9i, we are ready to see how pointers are used within Oracle to establish relationships between table rows.
164
Modern object databases store objects in an object table with an assigned OID. An OID is guaranteed to be globally unique. Each OID consists of a 128 B hexadecimal value. An OID cannot be used to locate an object instance by itself. Only a REF (or reference, discussed later in this chapter) containing location data can be used to locate an object instance. The concept of the ROWID, introduced in Oracle7, is also used in Oracle9i to uniquely identify each row in each table in a database. The ROWID in Oracle7 is a VARCHAR2 representation of a binary value shown in hexadecimal format. It is displayed as:
bbbbbbbb.ssss.ffff
where: bbbbbbbb is the block ID ssss is the sequence in the block ffff is the le ID The ROWID can be used to detect and eliminate duplicate rows in instances where a primary key for a table allows duplicate values. ROWIDs for duplicate customers are displayed in the following SQL:
SELECT ROWID, cust_no FROM CUSTOMER a WHERE a.cust_nbr > (SELECT MIN (b.ROWID) FROM CUSTOMER b WHERE a.cust_nbr = b.cust_nbr );
The concept of objects and OIDs is new in Oracle9i. Objects in Oracle9i have identiers attached to ROWIDs, giving an EXTENDED ROWID format. This format is 10 B long instead of the 6 B format used in Oracle7. The
165
ROWID in Oracle9i is a VARCHAR2 representation of a base 64 number. The Oracle9i ROWID is displayed as:
oooooo.fff.bbbbbb.sss
where: oooooo is the data object number fff is the relative le number bbbbbb is the block number sss is the slot number Oracle9i gives DBAs the ability to create a distinct OID that uniquely identies each row within a table. These OIDs are 128 B in length, as noted. Oracle guarantees that OIDs will remain unique to the database software even after they have been deleted. OIDs are similar to traditional pointers in legacy databases because they can point to any r ow in a database once they have been embedded into the column. We have already mentioned that many prerelational databases used pointers and employed linked-list data structures. These data structur es provided embedded pointers in the prex of each database entity. The pointers established one-to-many and many-to-many relationships among entities. The design of pointer-based databases was elegant since foreign keys were not needed to establish data relationships, but difcult implementation problems remained. The programmer had to remember the location, name, and type of each pointer in the database, making navigation a chore in network databases, such as CA-IDMS database, and hierarchical databases, such as IMS. Considering these limitations, a declarative language such as SQL is a luxury. Oracle database design is dramatically altered by the introduction of OIDs. SQL joins are no longer required to extract the values in an object since the OID can simply be dereferenced. Unfortunately, there is a price to pay: special utility programs are required to sweep every affected object in the database when a new OID is added. This is because the data relationships are hard linked with embedded pointers. Also, when an Oracle9i object containing an OID is deleted, the designer must still keep track of it. Otherwise, it is possible to have orphan OIDs that point to deleted objects. Oracle9i will alert you if an OID reference points to a deleted object by returning a True value with the SQL extension IF oid_column IS DANGLING. Having claried the concept of OIDs, we can investigate how to use pointers to navigate Oracle databases as an alter native to using JOIN operations to navigate between tables.
166
Oracle9i also allows the use of SQL to address rows by their OIDs, permitting the following pointer-based navigation:
SELECT customer_stuff FROM CUSTOMER WHERE OID = :host_variable;
The concept of retrieval by OIDs supplies Oracle database designers with a powerful new navigational tool. It means that you can navigate your database one row at a time with PL/SQL, capturing an OID as you retrieve a row and using that OID to access another r ow, and so on. However, in Oracle SQL, the access path is usually determined at runtime by the SQL optimizer and therefore remains hidden. Now that we have a sophisticated understanding of OID-based navigation, we are ready to see how an Oracle database can be designed to contain repeating groups within a row denition. The introduction of repeating groups violates Codds 1NF rule, so this feature is called non-1NF design.
167
the rst to implement the introduction of lists of values into relational databases. Initially, non-1NF modeling raised the ire of the titans of relational modeling and was treated with suspicion. Repeating groups soon proved their utility, however, and became more respectable. C.J. Date introduced the set concept into the relational model to allow this construct to t into the relational paradigm. Database designers now recognize that there are specic instances where the use of repeating groups improve an Oracle9i database design. In an environment where 1NF is violated and repeating groups are introduced into tables, there is a need for a set of rules to specify when repeating groups are acceptable. The following guidelines address this need: The size of repeating data items should be small. Repeating data items should be stationary and rarely changed. Repeating data should never be queried as a set. In other words, you should never select all of the repeating values within a single SQL query. This scenario illustrates the principle: suppose a university database needs to include the information that a student can take the ACT exam up to three times. There are only two choices without using repeating groups: 1. Unique columns can be created within the student table, assigning each repeating group a subscript, as follows:
CREATE TABLE STUDENT ( student_ID . . . act_score_one act_score_two NUMBER(3), NUMBER(3), NUMBER(5),
act_score_three NUMBER(3))
2. The repeating groups can be nor malized out and moved into another table, like this:
CREATE TABLE ACT_SCORE ( student_ID act_score NUMBER(5), NUMBER(3));
Contrast the above with how the repeating group might be implemented:
168
We have dened a data structure that can use an implied subscript to reference the data. For example, to insert the test scores for Don Burlington, we could enter:
INSERT INTO STUDENT act_score( VALUES 300 566 624) WHERE student_name.last_name = Burlington AND student_name.first_name = Don;
We can query the act_scores by referring to the subscript of the data item in order to select the test score for Don Burlington, as follows:
SELECT act_score(1), act_score(2), act_score(3) FROM STUDENT WHERE student_name.last_name = Burlington AND student_name.first_name = Don;
This gives a basic understanding of the use of repeating values within a database object or relational table. Well look at the advantages and disadvantages of this approach before determining when to use repeating groups.
169
Previously, Oracle7 required the repeating group to be joined with another table. Moreover, as the above example illustrates, less disk space is consumed because another table need not be created to hold the ACT scores and the necessity of duplicating a foreign key into the new table is avoided. If another table is nevertheless created, the student_ID for each and every row of the ACT_SCORE table will have to be redundantly duplicated.
Oracle9i SQL provides a rather clumsy alternative with the use of the SQL UNION operator. This involves creating a temporary table to hold the values in a single column, as shown:
create table temp as ( select act_score(1) from student union select act_score(2) from student union select act_score(3) from student
170
)
Determining When to Use Repeating Groups This leaves a simple central question: Do the advantages of enhanced performance and disk savings justify the use of repeating groups? Lets employ both techniques to the above example to answer this question. First, we remove the repeating group of act_score and place the scores in a table called ACT_SCORE. The table can easily be queried to get the list of students:
SELECT student_name FROM STUDENT, ACT_SCORE WHERE ACT_SCORE.student_ID = STUDENT.student_ID AND act_score > 500;
When we use repeating groups, we cannot know in advance how many cells contain data. Therefore, we must test to see how many values are present. To test whether the act_score column is NULL, we add the following special code to our example:
FOR i - 1 to 3 LOOP IF act_score(i) IS NOT NULL THEN . . . END LOOP
Repeating groups can be very useful within an Oracle9i design if the repeating groups are small and have the same number of repeating values. Repeating groups will greatly enhance performance in these cases, allowing us to avoid the additional work involved in joining several tables. Next, well see how repeating values appear in the Oracle object/relational model and discuss how they interact with ADTs.
Repeating Groups and ADTs Repeating groups are indicated in the Oracle9i engine by use of the varying-array language construct of PL/SQL (the VARRAY). Repeating
171
groups are declared within table columns by using the VARRAY mechanism. Oracle9i permits repeating groups to be introduced in two ways. We can either dene repeating groups of data items or repeating groups of OIDs with rows in another table. Well consider OIDs rst and then examine repeating groups of data values using an equivalent structure.
CREATE TYPE job_details AS OBJECT ( job_dates job_employer_name job_title job_address CREATE TYPE job_history ( VARRAY(3) OF REF job_details); CHAR(80), CHAR(80), CHAR(80), customer_address);
After dening the data types, we can use them to create the Oracle9i object:
CREATE TABLE CUSTOMER ( customer_name cust_address prior_jobs full_name, customer_address, job_history);
We have created a repeating list of references. We can now store job_history objects, capture the OIDs for these objects, and store them as reference columns in the prior_jobs column. We are now ready to
172
extract the data from the remote object using the following DEREF statement. As we know from basic programming, when a pointer type is placed inside the DEREF function, the data corresponding to the pointer location is returned:
SELECT DEREF(CUSTOMER.prior_jobs.job_title(3)) FROM CUSTOMER WHERE CUSTOMER.customer_name.last_name LIKE 'JONES%';
Accessing remote job_history rows by OID is much faster than doing so with an SQL join, as you probably suspected. However, another Oracle9i method is even faster than dereferencing an OID. A repeating group can be stored directly inside a table with repeating groups of data values. Lets investigate how these repeating groups of data values can appear in Oracle9i.
CREATE TYPE job_details ( job_dates job_employer_name job_title job_address CREATE TYPE job_history ( VARRAY(3) OF job_details); CHAR(80), CHAR(80), CHAR(80) customer_address);
Now the CUSTOMER table can be created using the data types already dened:
CREATE TABLE CUSTOMER ( customer_name full_name,
173
A job_history table has been created with three details. We have seen that prior_jobs must be subscripted to inform the database which of the three items are wanted. This is done with the following code:
SELECT CUSTOMER.prior_jobs.job_title(3) FROM CUSTOMER WHERE CUSTOMER.customer_name.last_name LIKE 'CARSON%';
A repeating list has been created within our table denition (Figure 6.6). The street address of the rst previous employer is selected by the following code:
SELECT CUSTOMER.prior_jobs.job_address.street_address(1) FROM CUSTOMER WHERE CUSTOMER.customer_name.last_name LIKE 'CARSON%';
It is important to note that repeating groups in Oracle9i can contain either data or pointers to rows within other tables. But what if nested data types themselves have repeating groups? It was easy to create a record that contained a nite repeating group in prerelational databases. For example, a record denition could contain three repeating groups of job history information in COBOL:
03 EMPLOYEE. 05 EMPLOYEE-NAME 05 JOB-HISTORY OCCURS 3 TIMES. 07 JOB-DATE 07 JOB-EMPLOYER-NAME 07 JOB-TITLE 07 EMPLOYER-ADDRESS 09 STREET-ADDRESS 09 CITY-ADDRESS 09 ZIP-CODE PIC X(80). PIC X(80). PIC X(80); PIC X(80). PIC X(80). PIC X(80). PIC X(80).
174
Customer customer name first name last name customer-address street city add. state zip job Job details (1) emp title cust address S C S Z Job details (2)
dates name
The JOB-HISTORY component is referenced by a subscript in COBOL, as the following examples clarify:
MOVE JOB-HISTORY(2) TO OUT-REC. MOVE 'DATABASE ADMINISTRATOR' TO JOB-TITLE(3).
Database designers will notice that 1NF is directly violated by the use of Oracle9i VARRAYs. If repeating values within a table cell are permissible, why not allow a reference to an entirely new table? Oracle9i provides this option with the nested table. Well look at pointing to tables in the next section.
Pointing to Tables
Imagine a database that allows nesting of tables within tables. In such a database, a single cell of one table can point to another table. While this concept might initially seem foreign, it is easier to understand if you keep in mind that many real-world objects are composed of subparts. Recall our discussion of OIDs. A pointer must be unique and permanent to establish data relationships between database entities. A r elational ROWID is used to identify a row in relational databases. Because a ROWID is the number of a physical database block and r ow, a problem arises
175
because it could be moved unintentionally to another block or, even worse, accidentally deleted. Oracle created OIDs for each row to solve this problem. With OIDs, each row is always identied uniquely, regardless of the physical placement of the row or its status. An OID associated with a row will never be reused by Oracle, even if the row is deleted. To create a table with OIDs, a data type containing all the necessary row infor mation must be cr eated. Assume that the data type customer_stuff contains the required data structures for a customer table in the example below. The table could be created in a traditional relational database as shown:
CREATE TABLE CUSTOMER (customer_data customer_stuff);
The table creation syntax is changed slightly when using OIDs. The following example creates exactly the same table as the preceding one, except that the table contains an OID for each row within the customer table:
CREATE TABLE CUSTOMER OF customer_stuff;
The utility of the relational model has always been seriously hampered by the inability to directly represent aggregate objects. Relational views were formerly required to assemble aggregate objects. Object technology professors used to ridicule the relational models inability to represent aggregate objects. Finally, Oracle users have the ability to r epresent real-world objects without resorting to views by utilizing nested ADTs. Lets see how Oracle represents this type of recursive data relationship. A TYPE denition is created for a list of orders in the following SQL. The pointers for the list of orders might become a column within an Oracle table, as indicated:
CREATE TYPE order_set AS TABLE OF order; CREATE TYPE customer_stuff ( customer_id customer_full_name . . . order_list order_set); integer, full_name,
customer_full_address customer_address,
176
Department Table
Course name sec 1 sec 2 sec 3
Course Table
section #
semester
Section Table
We can see the new syntax style for table creation here. The following two table declarations are the same, except that the syntax establishes OIDs in the CREATE TABLE OF statement. This is done so that other tables can contain references to rows in the CUSTOMER table: Without OIDs CREATE TABLE CUSTOMER (cust_data customer_stuff); With OIDs CREATE TABLE CUSTOMER OF customer_stuff AS OBJECT; Either way, we can now dene a pointer column called order_list in the CUSTOMER table. This pointer refers to a list of pointers. Each cell of the list contains pointers to rows in the ORDER table (Figure 6.7). Figure 6.7 shows how the pointer structure looks conceptually. However, to implement these repeating groups of pointers, object/relational databases must use internal arrays. Oracle9i uses variable-length arrays to represent this structure. In a pointer table, each column contains a nested list of pointers. Each cell within a column also contains a list of pointers to rows in the ORDER table. The three orders for this customer can be prejoined with the ORDER table using object-oriented SQL extensions, as follows:
UPDATE CUSTOMER SET order_list ( SELECT REF(order) from all order rows */ /* this returns the OIDs
177
We see that the use of the REF operator returns the reference, or OID, of the requested rows. This is similar to the retrieval of ROWIDs in a relational database, except that now the row information is stored inside a relational table. We are now ready to see how navigation between tables can be done without having to join tables together. In our earlier discussion we mentioned that the object/relational model provides two ways to retrieve data from the database. We can gather the required information either by using SQL to specify the desired data, where the access path is chosen by the SQL optimizer, or by navigating the database one r ow at a time. The following code returns the content of the three rows contained in the order_list VARRAY:
SELECT DEREF(order_list) FROM CUSTOMER WHERE customer_id = 123;
The important concept in the above example is that navigation between tables can be accomplished without ever performing an SQL join. Take a moment to consider the possibilities. There is never a need to embed a foreign key in the order record for the CUSTOMER table because the pointers can be stored in each customer row. This prevents a relational join between the CUSTOMER and ORDER tables, but since the ability to navigate between customers and orders with pointers has been maintained it makes no difference. Of course, the pointers from customers to orders work in only one direction. There is no method to get fr om the ORDER table to the CUSTOMER table unless a pointer has been embedded that points to the row containing the customer for each order. This can be done by creating an owner reference inside each order row containing the OID of the customer who placed the order.
178
Lets consider how Oracle9i represents this type of recursive data relationship:
CREATE TYPE order_set AS TABLE OF ORDER; CREATE TABLE CUSTOMER ( customer_id customer_full_name customer_full_address . . . order_list order_set); integer, full_name, customer_address,
We see that the ORDER table is nested within the CUSTOMER table (more about nesting in the next section). Now we need to populate the new table structure, as below:
INSERT INTO CUSTOMER VALUES ( full_name ('ANDREW','S.','BURLESON), customer_address('246 1st St.','Minot, ND','74635');
We are now in a position to appreciate the performance gain. The CUSTOMER table can be prejoined with the ORDER table to add the three orders for this customer:
179
The order_list entry in the CUSTOMER table now contains pointers to the three orders that have been placed by this customer. These pointers can be referenced without having to perform the task of a relational join:
SELECT DEREF(order_list) FROM CUSTOMER WHERE customer_id = 123; /* This will return 3 rows in the order table * /
This query returns a pointer to the three rows in the ORDER table. It is now a trivial matter to dereference these pointers to retrieve the contents of the ORDER table. It might look something like this, depending on the vendor implementation of SQL:
SELECT DEREF(order_list) FROM CUSTOMER WHERE customer_ID = 123;
Now its time to look deeper into the ability of Oracle9i to nest tables.
180
a whole other entity. In this way, structures can be created where objects (or tables) can be nested within other objects (or tables). This means that values in a single column inside a table can contain an entire table in an object/relational database. In turn, these subtable tables can have single column values that point to other tables, and so on, ad innitum. This new data structure presents exciting possibilities for modeling complex aggregate objects, even though applications for it may not be obvious. Database designers can create a structure in C ++ object-oriented databases, such as ONTOS and Objectivity, where an object contains a list of pointers. Each of these pointers refers to a separate list of pointers. These pointers, in turn, point to other objects in the database. This structure is known as **char in C language parlance, which is called a pointer to a pointer to a character. This structure is implemented in Oracle9i with a store table. A store table is an internal table that is tightly linked to the parent table. The data storage characteristics of the parent table are inherited by the store table. These characteristics include the initial extent of the table as well as the size of any new extent. A cell is dened as a pointer to a table in the highest level table. Each column value within the column pointing to a whole table must contain a pointer to a table with exactly the same denition. In other words, every pointer within the column is r estricted to pointing to tables with an identical denition. In practice, it may appear that each cell points to a whole table, but the object/relational databases actually implement this structure with the special store table. A store table is essentially nothing more than an internal table with a xed set of columns that is subordinate to the par ent table. A simple example will illustrate the use of this data structur e. Returning to the university database, the database has a many- to-many relationship between courses and student entities. A course has many students; a student can take many courses. This relationship between students and courses would be implemented in a traditional relational system by creating a junction table between the student and course entities. The primary keys from the student and course tables would then be copied into the junction table. This table is called GRADE in our example and the grade entity contains the student_ID and course_ID columns as foreign keys. Lets see how this could be implemented using pointers to whole tables. In a traditional relational implementation, to generate a class schedule for a student we would need to select the student row, join with the GRADE table, and nally join with the CLASS table, as follows:
181
We can avoid the three-way SQL join of the tables by choosing to create a store table that is subordinate to the STUDENT table. This table would contain the course_name, course_date, and grade for each student:
CREATE TYPE student_list ( student_full_name student_full_address grade full_name, full_address, CHAR(1));
CREATE TYPE student_list_type AS TABLE OF student_list; CREATE TABLE COURSE ( course_name dept_ID credit_hrs student_roster VARCHAR(20), NUMBER(4), NUMBER(2), student_list_type);
We see here that the student_roster column of the COURSE table contains a pointer to a table of TYPE student_list_type. Herein lies the illusion. While it appears that each distinct column value points to a whole table, the column actually points to a set of rows within the store table. The store table is common to all of the columns that contain
182
University
Department
Course
Section
this pointer structure. The store table also contains a special OID that points to the owner of the row in the parent table. The concept of nesting tables within tables is an excellent method for modeling hierarchical structures, but there is still some question about the utility of this data structure with actual data applications. The main advantage of nested tables is that they simplify queries because a single SELECT statement returns the desired rows of a store table. There remains some question about whether it is better to use this structure or to simply create another table because the nested table is still just another table with the same column structure for each row. With this basic understanding of OIDs and pointers, we are ready to look at one of the most sophisticated constructs in Oracle9 i the multidimensional array of OIDs.
183
CREATE TABLE SECTION OF section_type; CREATE TYPE section_array AS VARRAY(10) OF section_type; CREATE TYPE course_type ( course_ID course_name credit_hours section_list number(5), varchar(20), number(2), section_array);
CREATE TABLE COURSE OF course_type; CREATE TYPE course_array AS VARRAY(20) OF course_type; CREATE TYPE dept_type ( dept_name chairperson_name course_list varchar(20), full_name, course_array);
We see that pointers allow fast access from owner to member in the hierarchy. But, where are the owner pointers? A hierarchy must rst be dened before we have the necessary denitions to include the pointers. Owner pointers can be added using the ALTER TYPE statement, as follows:
ALTER TYPE section_type ADD COLUMN course_owner_pointer ALTER TYPE course_type ADD COLUMN department_owner_pointer department_type; course_type;
We have now created a two-way pointer structure where all the owner rows within the hierarchy point to the member rows and all member rows
184
point to their owners. We must remain aware, however, that these are merely data structures. The programmer must assign these pointers when the rows are created. This Oracle9i data structure is similar to the C language **char data structure (as mentioned earlier). The department has an array of pointers to courses in our Oracle9i example, which, in tur n, contain arrays of pointers to sections. The question we now face is how to query these pointers with Oracle SQL? Most object/relational vendors are implementing the CAST and MULTISET extensions to SQL to accommodate the new object features. A query to populate the STUDENT_LIST internal table is implemented as shown below:
INSERT INTO COURSE (STUDENT_LIST) (CAST (MULTISET (SELECT student_name, student_address, grade FROM GRADE, STUDENT WHERE GRADE.course_name = CS101 AND GRADE.student_name = STUDENT.student_name ) ) );
Those accustomed to pure relational syntax may feel that the new SQL extensions are rather foreign. With an understanding of how OIDs and VARRAYs operate within Oracle9i, we can consider the performance implications of these constructs to the Oracle database.
185
Store repeating groups inside a table cell Nest tables within tables Navigate using pointers to access relational data Represent aggregate objects Consider the ramications of pointer-based navigation for processing relational data. Imagine the possibilities of being free from SQL. Oracle9i allows us to navigate a data model without the cumbersome tasks of joining tables together. The new ability to represent complex objects is even more important. This means that we can precreate aggregate objects without having to construct them from their subobjects every time we need them. Methods can be attached to aggregate objects because they exist independently. Additionally, precreated complex database objects are instantly available and streamline database performance. The ability to represent real-world objects in the database is one of Oracle9is most exciting features. Traditional relational databases were hampered by the requirement that all data be stored at the most primitive level (i.e., 3NF) and that aggregate objects be created by combining tables. This restriction has been removed by Oracle9i. Aggregate objects can now be prebuilt from their components and stored within the database. Lets see how this works.
186
We can do the same thing for the ORDER table row since we will be retrieving one order row.
CREATE TYPE order_ref AS TABLE OF order_adt;
Therefore, the reference to the customer and order rows will be the rst component of the ORDER_FORM object:
CREATE TABLE ORDER_FORM ( customer order customer_ref, order_ref);
We now have the customer and order data. We still need to establish pointers to represent the many-to-many relationships between the ORDER and ITEM tables. We accomplish this by dening a repeating group of every order_line for each order:
CREATE TYPE item_list AS TABLE OF order_line;
The model lacks only pointers to every item r eferenced in the ORDER_LINE table. But, how can we create them? The item_ID numbers of the order are unknown until the order_line rows are retrieved. Therefore, owner pointers inside each order_line row must be established so that the item table can be dereferenced. Lets assume that the LINE_ITEM table has been dened to include a reference pointer to the item table, as follows:
/* example of an owner pointer */ CREATE TYPE item_ref AS TABLE OF item; CREATE TABLE LINE_ITEM ( order_ID item_ID item_pointer integer, integer, item_ref,
quantity_ordered integer);
187
Now that we understand how to create Oracle9i objects, we are ready to couple these objects using Oracle9i methods. Coupling data and methods requires careful design. The process typically begins by specifying prototypes for each of the Oracle9i methods.
188
This hierarchy helps us visualize how methods are nested within other methods. After the hierarchy has been developed, we can translate these processes into the database classes. The lowest level DFDs represent functional primitives, processes that cannot be decomposed into smaller processes. It is obvious that functional primitive processes become methods, but does this mean that they will never have subcomponents? With the exception of standalone methods, such as a compute_shipping_charges method, a method will never have subcomponents if the analyst has designed the process properly. Using the primitive processes, we can design a method that accepts the same values as noted on the DFD and r eturns those values to the retrieving program. For example, we might have a pr ocess called compute_shipping_charges that accepts a valid_in_stock_ order as input. The process gathers the weight and cost of the items, computes the charges, and returns the shipping charge and total weight. A prototype is essentially a formal denition of a method that describes all of the input and output data ows. The accepted form for a prototype is:
return_data_type Method_name (input_data_name_1 . . .); input_data_type_1, input_data_name_2 input_data_type_2,
Lets review the data types that methods can use befor e going into further detail. These data types can return a data value or they can be an input parameter for a method: INT integer value VARCHAR variable length character string TYPE pointer to a data structure (identical to an Oracle9i OID) Object novices are often confused by TYPE because it refers to pointers. A pointer in an object database is nothing more than an OID pointing to an object that supplies method values. The various types of OIDs must be carefully differentiated because Oracle9i supports strong data typing in the SCOPE clause of the CREATE TABLE statement. For example, a pointer to an order (*order) is quite different from a pointer to a customer (*customer) object. It is more efcient to use pointers than the data itself because the OID pointer is more compact.
189
As mentioned, Oracle9i fully supports strong typing. Oracle9i uses the SCOPE verb in the CREATE TABLE statement to limit the type of a reference column to a particular OID table. For example, if a customer table is dened with a VARRAY of OIDs consisting of customer orders, the SCOPE clause can be used to ensure that only OIDs from the ORDER table are stored within these columns. In object database parlance, a prototype is designed for each process in the DFD. This is illustrated by examining how the prototype is designed for the compute_shipping_charges method. According to the DFD, compute_shipping_charges accepts a valid_in_stock_order and outputs the shipping_charge for the order. Therefore, the prototype could return an integer (the shipping charge, dened as INT) from compute_shipping_charges and accept a pointer to an order object:
INT compute_shipping_charge(valid_in_stock_order *order);
Here we see the prototype details: INT this says that the returned value will be an integer. compute_shipping_charge this is the name of the procedure. valid_in_stock_order this is the rst parameter pass to the procedure. *order this is the second parameter passed and the * indicates that it is a pointer data type, pointing to an ORDER object. We a s s u m e f o r t h e p u r p o s e o f t h i s e x a m p l e t h a t t h e valid_in_stock_order contains the following four values, which the process requires to compute the shipping charges: 1. 2. 3. 4. Weight in pounds Desired class of shipping Origination zip code Destination zip code
How can the data items be retrieved when the order is represented by an OID? The method retrieves the data by dereferencing the OID and accessing the order object. In other words, the method captures the OID and dictates SQL to retrieve data items from the object. The SQL within the compute_shipping_charges method might look like this:
select item_weight, shipping_class,
190
The function above yields the shipping charge, expressed as an integer. If the method has no pointer to the order object, the pr ototype for compute_shipping_charges becomes far more complicated, as the following shows:
INT compute_shipping_charge (weight class int, char(1),
Note that INT refers to the data type of the value r eturned by the method. If the method does not return a value, INT is replaced by VOID. For example, a method called give_raise would not return a value and could be prototyped as:
VOID give_raise(emp_ID number(9), percentage int);
Armed with this basic understanding of prototyping, we are ready to prototype some methods. We need to know the names and data types of all input data, as well as the name and data types of the returned value. These are generally derived from the DFDs in the initial systems analysis.
*order int int *invoice int *backorder void *packing_slip int int int fill_order(cust_info *customer); check_customer_credit(cust_info *customer); check_inventory(item_number int); prepare_invoice(valid_in_stock_order *order_form); check_stock_level(item_number int); generate_backorder_request(item_number int); decrement_inventory(item_number int); prepare_packing_slip(valid_in_stock_order *order_form); compute_order_cost(valid_in_stock_order *order_form); compute_shipping_charges(valid_in_stock_order *order_form); add_handling_charge(total_weight int);
191
Lets describe these prototypes to become more comfortable with the denitions. We see that some methods return an integer, some return values, and others return object pointers. It is not uncommon to combine assignment statements with method calls in object-oriented databases. For example, the following process code computes the shipping charges for the order and assigns the r esult to a variable called my_shipping_ charges:
my_shipping_charges = compute_shipping_charges(:my_order_form_OID);
In the same way, a method call can also return an OID. This means that an OID can be embedded into another object. We assume in the code below that the data type for order_OID has been dened as a pointer to an order. Two things can now be done in a single statement. The fill_order method can be initialized while simultaneously returning the OID of the new order object into the order_OID variable, as follows:
order_OID = fill_order(:cust_info);
The name and data type of every input and output variable has been specied for each method. It is r equired that each method be tested independently. The internal variable might be unknown to the calling method. This is called information hiding. It is used whenever private variables are declared within a method. One of the goals of object-oriented databases is to make each method a reusable procedure that can always be counted on to function properly. This is the foundation of object method design. Lets introduce the Oracle9i model into this system. It should be obvious by now that several components are used to describe an object/relational database design. First, the object/relational model must be delineated for the base objects. Figure 6.9 displays the base classes in the order processing system and describes the indexes, tablespaces, and subclasses for each class denition. Now take a look at the aggregate class diagram shown in Figure 6.10. Here, we see two aggregate class denitions, their internal pointer structures, and the index and tablespace infor mation for all classes that are composed of pointers to other objects.
192
Line-item Item Number Item Name Quantity Price Add_line_item() Delete_line_item(); Customer Order order number order date check-credit(); check inventory(); generate-invoice(); Customer name Phone Display_cust list();
Air_cooled number_of_fins,
Note that the models show both base classes, as well as the aggregate classes. The problem now is mapping the method prototypes for these classes. Since all objects are represented as tables in the object/relational model, the aggregate object allows the owner table to be coupled with aggregate methods. The method informs the aggregate object in this way.
193
customer_ix tree DN cust_ID,cust_last_name ts-region Customer table cust-ID 300 DN order-IX tree DN ORD_ID ts_order_ix
Figure 6.10 The Aggregate Class Diagram for the Order Processing System
However, more complex methods can be linked to their target objects. For example, an order_form object might contain a method called check_payment_history, which performs detailed checks into the prior payment history for a customer who is placing an order. Lets analyze the methods that might be associated with these objects. If a method of the same name appears in multiple class denitions, the database rst references the object and then searches for the method in the class hierarchy. The following example indicates some sample methods that might be associated with each type of student object.
Methods for student: Display_student(); Compute_tuition(); enroll_student(); Methods for graduate_student; assign_mentor(); compute_tuition(); update_thesis_status();
194
We see that some methods unique to the subclass appear only within the subclass denition. For example, update_thesis_status would have no meaning to an undergraduate student. We have provided a general method for mapping processes with their database objects. We have emphasized that careful method planning occurs before a database schema is dened. This planning is critical in the design of an object database. The importance of careful method placement cannot be overemphasized. Method planning is crucial in Oracle9i because many types of an object may be dened within the class hierarchies, each having identical method names, but with vastly different processing. To illustrate, a method called compute mileage might exist for both a sailboat class and an automobile class. The internals for the sailboat would use nautical miles while the automobile would use statute miles. Oracle9i allows a new method to be created that differentiates between the type of object used in either case. The new method will be known only to objects within that class of subclasses. Objects belonging to other classes will never know that the new method exists. This is called overloading. Overloading is extremely powerful because new code can be introduced into a system with absolute certainty that no unintended side effects will occur. Now that we understand the performance implications of the Oracle9i object features for our database design, lets look at the design considerations within the base Oracle engine. Bear in mind that the most important factor in database performance is proper design. Proper design is crucial because no amount of tuning will correct a poorly designed database.
Stored Procedures and Oracle Tables Objects such as stored procedures and triggers are becoming more popular, moving application code away from external programs and into the database engine. Oracle encouraged this tr end in anticipation of the object-oriented features introduced in Oracle8. However, the Oracle DBA must be conscious of the increasing memory demands of stored procedures and plan carefully for the eventual storage of all database access code within the database.
195
Most Oracle databases today have only a small amount of code in stored procedures, but this is changing rapidly. Many compelling benets accrue from placing all Oracle SQL inside stor ed procedures. These benets include: Improved performance stored procedures are loaded once into the SGA and remain there unless they become paged out. Subsequent runtimes of the stored procedures are far faster than external code. Coupling of data with behavior relational tables can be coupled with the behaviors associated with them by using naming conventions. Oracle9i gives us the ability to store procedures that are directly associated with the database table through the use of methods. For example, if all behaviors associated with the EMPLOYEE table are p r e x e d w i t h t h e t a b l e n a m e ( e . g . , EMPLOYEE.hire , EMPLOYEE.give_raise), then the data dictionary can be queried to list all behaviors associated with a table (e.g., SELECT * FROM DBA_OBJECTS WHERE OWNER = EMPLOYEE) and code can be readily identied and reused. Isolation of code all SQL is moved out of the external programs a procedures. The application program becomes nothing more than a call to a stored procedure. This feature makes it a simple matter to interchange one database for another. Stored procedures and triggers function faster than traditional code primarily because of Oracles SGA. Once a procedure has been loaded into the SGA, it remains in the library cache until it is paged out of memory. Items are paged out of memory according to a LRU algorithm. The procedure will execute quickly once it has been loaded into the RAM memory of the shared pool. The trick is to prevent pool-thrashing during the period when many procedures are competing for a limited amount of library cache within the shared pool memory. Two init.ora parameters emerge as more important than all other parameters combined for tuning Oracle. They ar e the db_block_ buffers and the shared_pool_size parameters. These two parameters dene the size of the in-memory region that Oracle consumes on startup and also determine the amount of storage available to cache data blocks, SQL, and stored procedures. Oracle also provides the package construct. A package is essentially a collection of stored procedures and functions that can be organized in various ways. Stored procedures and functions for employees can be logically grouped together in an employee package as in the following example:
196
CREATE PACKAGE EMPLOYEE AS FUNCTION compute_raise_amount (percentage NUMBER); PROCEDURE hire_employee(); PROCEDURE fire_employee(); PROCEDURE list_employee_details(); END EMPLOYEE;
The code above creates a package to encapsulate all employee behaviors (Oracle functions and stored procedures) into a single package that will be added into Oracles data dictionary. Stored procedures place the SQL directly into the database and out of the exter nal application programs. The external programs are reduced to mere procedure calls. The shared pool will become important as systems increasingly place process code within stored procedures. The shared pool consists of the following subpools: Dictionary cache Library cache Shared SQL areas Private SQL areas (these exist during cursor open/cursor close) Persistent area Runtime area We mentioned that the shared pool utilizes a LRU algorithm to determine which objects are paged out of the shar ed pool. Fragments, or discontiguous chunks of memory, are created within the shared pool as this paging occurs. This means that a large procedure that originally t into memory may no longer t into contiguous memory when it is reloaded after paging out. A problem can occur when the body of a package has been paged out of the SGA due to more recent (or more frequent) activity. The server might not nd enough contiguous memory to reload the package body due to fragmentation. This would result in an ORA-4031 error. Paging can be avoided in Oracle by pinning packages in the SGA.
CONCLUSION
This chapter has dealt with the wealth of physical implementations of Oracle table storage structures with an eye toward using the appropriate physical structure to match the logical design model. We are now ready to conclude this book with a discussion of Oracle index design methods.
7
ORACLE INDEX DESIGN
INTRODUCTION
In Oracle, an index is used to speed up the time required to access table information. Internally, Oracle indexes are B-tree data structures in which each tree node can contain many sets of key values and ROWIDs. In general, Oracle indexes exist for the purpose of preventing FTSs. FTSs create two problems. The main problem created by FTSs is the time lost in servicing a request, as each and every row of a table is read into Oracles buffer pool. In addition to causing the task performance to suffer, a FTS also causes performance degradation at the system level. When this happens, all other tasks on the system might have to incur additional I/O because the buffer block held by competing tasks will have been ushed by the FTS. As blocks are ushed from the buffer pool, other tasks are required to incur additional I/Os to reread information that would have remained in the buffer pool if the FTS had not been invoked. Almost any Oracle table can benet from the use of indexes. The only exception to this rule would be a small table that can be read in less than two-block I/Os. Two-block I/Os are used as this guideline because Oracle will need to perform at least one I/O to access the root node of the index tree and another I/O to retrieve the requested data. For example, assume that a lookup table contains rows of 25 B each and you have congured Oracle to use 4 K block sizes. Because each data block would hold about 150 rows, using an index for up to 300 rows would not make processing any faster than a FTS. Lets start with a review of Oracle index design basics and then move on to discuss design of indexes for high-speed data access within Oracle.
197
198
Level 2
Level 3
199
Oracle offers several options when creating an index using the default B-tree structure. It allows you to index on multiple columns (concatenated indexes) to improve access speeds. Also, it allows for individual columns to be sorted in different orders. For example, we could create a B-tree index on a column called last_name in ascending order and have a second column within the index that displays the salary column in descending order.
create index name_salary_idx on person ( last_name asc, salary desc);
While B-tree indexes are great for simple queries, they are not very good for the following situations: Low-cardinality columns columns with less than 200 distinct values do not have the selectivity required to benet from standard B-tree index structures. No support for SQL functions B-tree indexes are not able to support SQL queries using Oracles built-in functions. Oracle provides a variety of built-in functions that allow SQL statements to query on a piece of an indexed column or on any one of a number of transformations against the indexed column. Prior to the introduction of Oracle FBIs, the Oracle cost-based SQL optimizer had to perform time-consuming long-table FTSs due to these shortcomings. Consequently, it was no surprise when Oracle introduced more robust types of indexing structures.
Bitmapped Indexes
Oracle bitmap indexes are very different from standard B-tree indexes. In bitmap structures, a two-dimensional array is created with one column for every row in the table being indexed. Each column represents a distinct value within the bitmapped index. This two-dimensional array represents each value within the index multiplied by the number of r ows in the table. At row retrieval time, Oracle decompresses the bitmap into the RAM data buffers so it can be rapidly scanned for matching values. These
200
matching values are delivered to Oracle in the form of a ROWID list. These ROWID values may directly access the required information. The real benet of bitmapped indexing occurs when one table includes multiple bitmapped indexes. Each individual column may have low cardinality. The creation of multiple bitmapped indexes provides a powerful method for rapidly answering difcult SQL queries. For example, assume there is a motor vehicle database with numerous low-cardinality columns such as car_color, car_make, car_model, and car_year. Each column contains less than 100 distinct values by themselves and a B-tree index would be fairly useless in a database of 20 million vehicles. However, combining these indexes together in a query can provide blistering response times a lot faster than the traditional method of reading each one of the 20 million rows in the base table. For example, assume we wanted to nd old blue Toyota Corollas manufactured in 1981.
select license_plat_nbr from vehicle where color = blue and make = toyota and year = 1981;
Oracle uses a specialized optimizer method called a bitmapped index merge to service this query. In a bitmapped index merge, each ROWID list is built independently by using the bitmaps and a special merge routine is used to compare the ROWID lists and nd the intersecting values. Using this methodology, Oracle can provide subsecond response time when working against multiple low-cardinality columns (Figure 7.2).
Function-Based Indexes
One of the most important advances in Oracle indexing is the introduction of function-based indexing. FBIs allow the creation of indexes on expressions, internal functions, and user-written functions in PL/SQL and Java. FBIs ensure that the Oracle designer is able to use an index for its query. Prior to Oracle8, the use of a built-in function would not be able to match the performance of an index. Consequently, Oracle would perform the
201
dreaded FTS. Examples of SQL with function-based queries might include the following:
Select Select Select Select * * * * from from from from customer customer customer customer where where where where substr(cust_name,1,4) = BURL; to_char(order_date,MM) = 01; upper(cust_name) = JONES; initcap(first_name) = Mike;
Remember, Oracle always interrogates the WHERE clause of the SQL statement to see if a matching index exists and then evaluates the cost to see if the index is the lowest-cost access method. By using FBIs, the Oracle designer can create a matching index that exactly matches the predicates within the SQL WHERE clause. This ensures that the query is retrieved with a minimal amount of disk I/O and the fastest possible speed.
Index-Organized Tables Beginning with Oracle8, Oracle recognized that a table with an index on every column did not require table rows. In other words, Oracle recognized that by using a special table access method called an index fast full scan, the index could be queried without actually touching the data itself. Oracle codied this idea with its use of IOT structure. When using an IOT, Oracle does not create the actual table but instead keeps all of the required information inside the Oracle index. At query time, the Oracle SQL optimizer recognizes that all of the values necessary to service the query exist within the index tr ee. Therefore, at that time, the Oracle cost-based optimizer has a choice of either reading through the index tree nodes to pull the information in sorted order or invoke an index fast full scan, which will read the table in the same fashion as a FTS, using sequential prefetch (as dened by the db_file_multiblock_read_ count parameter). The multiblock read facility allows Oracle to quickly scan index blocks in linear order, quickly reading every block within the index tablespace. This listing includes an example of the syntax to create an IOT:
202
Oracle dominates the market for relational database technology, so Oracle designers must be aware of the specialized index structures and fully understand how they can be used to improve the performance of all Oracle SQL queries. Many of these techniques ar e discussed in my book Oracle High-Performance SQL Tuning (Oracle Press, 2001). This text details the process of creating all of Oracles index tree structures and offers specialized tips and techniques for ensuring SQL queries are serviced using the fastest and most efcient indexing structure.
203
In practice, many Oracle SQL tuning professionals will resequence the table rows into the same physical order as the primary index. This technique can reduce disk I/O on index range scans by several orders of magnitude.
Oracle enhanced the fast full-index scan to make it behave similar to a FTS. Just as Oracle has implemented the db_file_multiblock_ read_count parameter for FTSs, Oracle allows this parameter to take effect when retrieving rows for a fast full-index scan. Since the whole index is accessed, Oracle allows multiblock reads. There is a huge benet to not reading the table rows, but there are some requirements for Oracle to invoke the fast full-index scan. All of the columns required must be specied in the index. That is, all columns in the SELECT and WHERE clauses must exist in the index. The query returns more than 10 percent of the rows within the index. This 10 percent gure depends on the degree of multiblock reads and the degree of parallelism. You are counting the number of rows in a table that meet a specic criterion. The fast full-index scan is almost always used for count(*) operations. You can also force a fast full-index scan by specifying the index_ffs hint, which is commonly combined with the parallel_index hint to improve performance. For example, the following query forces the use of a fast full-index scan with parallelism:
204
It is not always intuitive whether a fast full-index scan is the fastest way to service a query, because of all of the variables involved. Hence, most expert SQL tuners will time any query that meets the fast full-index scan criteria and see if the response time improves. If you plan to use the Oracle Parallel Query facility, all tables specied in the SQL query must be optimized for a FTS. If an index exists, the cost-based optimizer must be used with a hint to invalidate the index in order to use parallel query. For the rule-based optimizer, indexes can be turned off by using an Oracle function in the WHERE clause. One important concept in indexing is the selectivity or the uniqueness of the values in a column. To be most effective, an index column must have many unique values. Columns having only a few values (e.g., sex = m/f, status = y/n) are not good candidates for traditional Oracle B-tree indexing, but they are ideal for bitmapped indexes. For a tr ee index, the sparse distribution of values would be less efcient than a FTS. To see the selectivity for a column, compare the total number of rows in the table with the number of distinct values for the column, as follows:
SELECT count(*) FROM CUSTOMER; SELECT DISTINCT STATUS FROM CUSTOMER;
Another concept used in indexing is called distribution. Distribution refers to the frequency that each unique value is distributed within a table. For example, lets say you have a state_abbreviation column that contains 1 of 50 possible values. This is acceptable to use as an index column, provided that the state abbreviations are uniformly distributed across the rows. However, if 90 percent of the values are for New York, then the index will not be effective. Oracle has addressed the index data distribution issue with the ANALYZE TABLE command. When using Oracles cost-based SQL optimizer, ANALYZE TABLE looks at both the selectivity and distribution of the column values. If they are found to be out-of-bounds, Oracle can decide not to use the index. Oracle has provided a view called DBA_HISTOGRAMS
205
that tells the cost-based optimizer about the distribution of values within a column. The purpose of a histogram is to provide a clear picture of the distribution of values within a low-cardinality index. Unfortunately, getting the histogram data requires each and every index column to be analyzed. Most Oracle designers favor bitmapped indexes over tr ee indexes for low-cardinality indexes. Remember, indexes are a valuable shortcut for the Oracle optimizer and a careful design will help reduce unnecessary logical I/O at query runtime. Oracle recommends the following guidelines when considering whether to index a column: Columns frequently referenced in SQL WHERE clauses are good candidates for an index. Columns used to join tables (primary and foreign keys) should be indexed. Columns with poor selectivity should not be indexed using a B-tree. Any column with less than 10 percent of unique values should be indexed as a bitmap index. Frequently modied columns are not good candidates for indexing because excessive processing is necessary to maintain the structure of the index tree. Columns used in SQL WHERE clauses using Oracle functions or operators should not be indexed. For example, an index on last_name will not be effective if it is referred to in the SQL as upper (last_name). When using RI, always create an index on the foreign key. Most programmers do not realize that database deadlocks occur frequently within the database indexes. It is important to note that a SELECT of a single row in a database can cause more than one lock entry to be placed in the storage pool because all affected index rows are also locked. In other words, the individual row receives a lock, but each index node that contains the value for that row will also have locks assigned. If the last entry in a sorted index is retrieved, the database will lock all index nodes that reference the indexed value in case the user changes that value. Because many indexing schemes always carry the high-order key in multiple index nodes, an entire branch of the index tree can be locked all the way up to the root node of the index. While each databases indexing scheme is different, some relational database vendors recommend that tables with ascending keys be loaded in descending order, so the rows are loaded from Z to A on an alphabetical key eld. Other databases, such as Oracle, r ecommend indexes be dropped and recreated after rows have been loaded into an empty table.
206
When an UPDATE or DELETE is issued against a row that participates in an index, the database will attempt an exclusive lock on the r ow. This attempt requires the database to check if any shared locks are held against the row as well as for any index nodes that will be af fected. Many indexing algorithms allow for the index tree to dynamically change shape, spawning new levels as items ar e added and condensing levels as items are deleted. However, for any table of consequential size, indexes are not recommended to improve performance. Indexes require additional disk space and a table with an index on each column will have indexes that consume more space than the table they support. Oracle will update indexes at runtime as columns are deleted, added, or modied. This index updating can cause considerable performance degradation. For example, adding a row to the end of a table will cause Oracle to adjust the high-key value for each node in the table. Another guideline for determining when to use an index involves examination of the SQL issued against a table. In general, if the SQL can be collected and each value supplied in each SQL WHERE clause, then the SQL could be a candidate for inclusion in an index. Another common approach for determining where to create indexes is to run an EXPLAIN PLAN for all SQL and carefully look for any FTSs. The Oracle cost-based optimizer operates in such a fashion that Oracle will sometimes perform a FTS, even if an index has been dened for the table. This occurs most commonly when issuing complex n-way joins. If you are using rule-based optimization in Oracle, the structure of an SQL statement can be adjusted to force the use of an existing index. For Oracles cost-based optimizer, adding hints to the structure can ensure that all indexes are used. The cost-based optimizer sometimes chooses FTS when an index scan would be more efcient. Indexes do much more than speed up an individual query. When FTSs are performed on a large Oracle table, the buffer pool begins to page out blocks from other queries. This causes additional I/O for the entir e database and results in poor performance for all queries, not just the offending FTS. Indexes are never a good idea for long descriptive columns. A column called customer_description would be a poor choice for an index because of its length and the inconsistency of the data within the column. Also, a eld such as customer_description would usually be referenced in SQL by using Oracle extensions, such as SUBSTR, LIKE, and UPPER. Remember, these Oracle extensions invalidate the index. Suppose an index has been created on customer_last_name. The following query would use the index:
207
Unlike other relational databases, such as DB2 , Oracle cannot physically load a table in key order. Consequently, administrators can never guarantee that the rows in a table will be in any particular order. The use of an index can help whenever the SQL ORDER BY clause is used. For example, even if there are no complex WHERE conditions, the presence of a WHERE clause will assist the performance of the query. Consider the following SQL:
SELECT customer_last_name, customer_first_name FROM CUSTOMER ORDER BY customer_last_name, customer_first_name;
208
Here, building a multivalued index on customer_last_name and customer_first_name will alleviate the need for an internal sort of the data, signicantly improving the performance of the query:
CREATE INDEX cust_name ON CUSTOMER ( customer_last_name, customer_first_name ) ascending;
Now that we understand the basic index design principles, lets explore the use of indexes for high-speed systems. We will cover index parallelism, compression, and techniques to reduce index disk I/O.
But what if your goal is to minimize computing resources? If this SQL is inside a batch program, then it is not important to start returning rows
209
Disk Sort
quickly and a different execution plan would take fewer resources. In this example, a parallel FTS followed by a back-end sort will require less machine resources and less I/O because blocks do not have to be reread to pull the data in sorted order (Figure 7.3). In this example, we expect the result to take longer to deliver (no rows until the sort is complete), but we will see far less I/O because blocks will not have to be reaccessed to deliver the rows in presorted order. Lets assume that this execution plan delivers the result in 10 seconds with 5000 db_block_gets.
Nologging Option
The nologging option bypasses the writing of the redo log, signicantly improving performance. The only danger with using nologging is that you must rerun the CREATE INDEX syntax if you perform a roll-forward database recovery. Using nologging with CREATE INDEX can speed index creation by up to 30 percent:
CREATE INDEX cust_dup_idx ON customer(sex, hair_color, customer_id) PARALLEL 35 NOLOGGING;
210
The compress option allows you to specify the prex length for multiple column indexes. In this example, we have a nonunique index on several low cardinality columns (sex and hair_color) and a high cardinality column (customer_id):
CREATE INDEX cust_dup_idx ON Customer ( sex, hair_color, customer_id ) PARALLEL 35 NOLOGGING COMPRESS 2;
211
In summary, there are many parameters that you can use to improve the performance of Oracle index creation, the size of the index tree, and the height of the tree structure. Now lets look at how you can adjust your block size when designing indexes to reduce disk I/O.
The ERADMIN.ADMISSION table has 150,000 rows in it and has an index build on the PATIENT_ID column. An EXPLAIN PLAN of the query reveals that it uses an index range scan to produce the desired end result:
Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE 1 (Cost=41 Card=1 Bytes=4) 1 0 SORT (AGGREGATE) 2 1 INDEX (FAST FULL SCAN) OF 'ADMISSION_PATIENT_ID' (NON-UNIQUE) (Cost=41 Card=120002 Bytes=480008)
Executing the query (twice to eliminate parse activity and to cache any data) with the index residing in a standard 8 K tablespace produces these runtime statistics:
Statistics --------------------------------------------------0 0 recursive calls db block gets
212
To test the effectiveness of the new 16 K cache and 16 K tablespace, the index used by the query will be rebuilt into the 16 K tablespace that has the exact same characteristics as the original 8 K tablespace, except for the larger block size:
alter index eradmin.admission_patient_id rebuild nologging noreverse tablespace indx_ 16k;
Once the index is nestled rmly into the 16 K tablespace, the query is reexecuted (again twice) with the following runtime statistics being produced:
Statistics --------------------------------------------------0 0 211 0 0 371 430 2 0 0 1 recursive calls db block gets consistent gets physical reads redo size bytes sent via SQL*Net to client bytes received via SQL*Net from client SQL*Net roundtrips to/from client sorts (memory) sorts (disk) rows processed
As you can see, the amount of logical reads has been reduced in half simply by using the new 16 K tablespace and accompanying 16 K data
213
cache. Clearly, the benets of properly using the new data caches and multiblock tablespace feature of Oracle and above are worth your investigation and trials in your own database. Next, lets see how Oracle cost-based optimizer inuences our index design decision.
120
2140
4160
6180
81100
124
25
25
2690
91100
214
For example, assume that we have a ve-way table join where the result set will be only 10 rows. Oracle will want to join the tables together in such a way as to make the result set (cardinality) of the rst join as small as possible. By carrying less baggage in the intermediate result sets, the query will run faster. To minimize intermediate results, the optimizer attempts to estimate the cardinality of each result set during the parse phase of SQL execution. Having histograms on skewed columns will greatly aid the optimizer in making a proper decision. (Remember, you can create a histogram even if the column does not have an index and does not participate as a join key.) Because a complex schema might have tens of thousands of columns, it is impractical to evaluate each column for skew and thus Oracle provides an automated method for building histograms as part of the dbms_stats utility. By using the method_opt=>for all columns size skewonly option of dbms_stats, you can direct Oracle to automatically create histograms for those columns whose values are heavily skewed. Well take a look at this option in more detail later. As a general rule, histograms are used to predict the cardinality and the number of rows returned in the result set. For example, assume that we have a product_type index and 70 percent of the values are for the HARDWARE type. Whenever SQL with where product_type= HARDWARE is specied, a FTS is the fastest execution plan, while a query with where product_type=SOFTWARE would be fastest using index access. Because histograms add additional overhead to the parsing phase of SQL, you should avoid them unless they are required for a faster optimizer execution plan. But, there are several conditions where creating histograms is advised: When the column is referenced in a query remember, there is no point in creating histograms if the queries do not reference the column. This mistake is common: many DBAs will create histograms on a skewed column, even though it is not referenced by any queries. When there is a signicant skew in the distribution of column values this skew should be sufciently signicant that the value in the WHERE clause will make the optimizer choose a different execution plan. When the column values cause an incorrect assumption if the optimizer makes an incorrect guess about the size of an intermediate result set, it may choose a suboptimal table join method. Adding a histogram to this column will often provide the information required for the optimizer to use the best join method.
215
Now that we understand histogram design, lets examine how the physical row order inuences Oracles choice of index usage.
Level 2
Level 3
216
Level 2
Level 3
Conversely, a high clustering_factor, where the value approaches the number of rows in the table (num_rows), indicates that the rows are not in the same sequence as the index and additional I/O will be required for index range scans. As the clustering_factor approaches the number of rows in the table, the rows are out of sync with the index. However, even if a column has high selectivity, a high clustering_ factor and small avg_row_len will indicate that the column values are randomly distributed across the table and additional I/O will be required to fetch the rows. In these cases, an index range scan would cause a huge amount of unnecessary I/O (Figure 7.6); a FTS would be far more efcient. In summary, the clustering_factor, db_block_size, and avg_row_len all inuence the optimizers decision about performing a FTS versus an index range scan. Its important to understand how these statistics are used by the optimizer. Now that we understand the basics of Oracle indexing and design for performance, lets move on to take a look at how indexes interact with RI and examine the performance implications of these features.
217
dept_name
organization_name
CHAR(20)
CHAR(2)
on the CUSTOMER table for the cust_id (Listing 7.1) will create an index on the eld and it is not necessary to manually build an index. Note that you should always specify the location clause when declaring constraints. In the previous example, had the cust_ukey constraint been dened without the STORAGE clause, then the index would have been placed in whatever tablespace was speci ed by the table owners DEFAULT tablespace, with whatever default storage parameters were in effect for that tablespace. In Listing 7.1, the rst constraint is on the cust_nbr column, the primary key. When you use Oracles RI to specify a primary key, Oracle automatically builds a unique index on the column to ensur e that no duplicate values are entered.
218
The second constraint in the listing above is on the dept_name column of the DEPT table. This constraint tells Oracle that it cannot remove a department row if there are existing customer rows that reference that department. ON DELETE CASCADE tells Oracle that when the department row is deleted, all customer rows referencing that department will also be deleted. The next RI constraint on organization_name ensures that no organization is deleted if customers are participating in that organization. ON DELETE RESTRICT tells Oracle not to delete an organization row if any customer row still references the organization. Only after each and every customer has been set to another organization can the row be deleted from the organization table. The last RI constraint shown in the listing above is called a check constraint. Using a check constraint, Oracle will verify that the column is one of the valid values before inserting the row, but it will not create an index on the column. In addition to basic indexes, Oracle8 allows for an index to contain multiple columns. This ability can greatly inuence the speed at which certain types of queries function within Oracle.
219
If this query were issued against the CUSTOMER table, Oracle would never need to access any rows in the base table. Because all key values are contained in the index and the high- order key (customer_last_name) is in the ORDER BY clause, Oracle can scan the index and retrieve data without ever touching the base table. With the assistance of this feature, the savvy Oracle developer can also add columns to the end of the concatenated index so that the base table is never touched. For example, if the preceding query also returned the value of the customer_address column, this column could be added to the concatenated index, dramatically improving performance. In summary, the following guidelines apply when creating a concatenated index: Use a composite index whenever two or more values are used in the SQL where the clause and the operators are ANDed together. Place the columns in a WHERE clause in the same order as in the index, with data items added at the end of the index. Now that we understand the basic constructs of Oracle indexes, lets take a closer look at the SQL optimizer and examine how it chooses which indexes to use to service SQL requests.
220
AND
amount_due > 1000 AND state = 'IOWA' AND job_description LIKE lower('%computer%');
Here, you can see a query where a FTS would be the most efcient processing method. Because of the complex conditions and the use of Oracle extensions in the SQL, it might be faster to perform a FTS. However, the fast execution of this task might be done at the expense of other tasks on the system as the buffer pool becomes ushed. In general, the type of optimizer will determine how indexes are used. As you probably know, the Oracle optimizer can run as either rule-based or cost-based. As a general rule, Oracle is intelligent enough to use an index if it exists, but there are exceptions to this rule. The most notable exception is the n-way join with a complex WHERE clause. The cost-based optimizer, especially when the all_rows mode is used, will get confused and invoke a FTS on at least one of the tables, even if the appr opriate foreign key indexes exist for the tables. The only remedy to this problem is to use the rule-based optimizer or use the first_rows mode of the cost-based optimizer. Always remember, Oracle will only use an index when the index column is specied in its pure form. The use of the SUBSTR, UPPER, LOWER, and other functions will invalidate an index. However, there are a few tricks to help you get around this obstacle. Consider the following two equivalent SQL queries:
SELECT * FROM CUSTOMER WHERE total_purchases/10 > 5000; SELECT * FROM CUSTOMER WHERE total_purchases > 5000*10;
The second query, by virtue of the fact that it does not alter the index column, would be able to use an index on the total_purchases column. While index usage has an impact on database performance, the way that the Oracle8 tables are allocated can also inuence the performance of systems, especially those that have a high amount of updating. Lets
221
take a look at some of the most important considerations when allocating Oracle8 tables.
The star optimizer replaces the WHERE clauses as follows. Note that the equi-join criteria is replaced by a subquery using the IN clause.
222
We see a similar transformation in the join into the time table: Month clause Month clause before star transformation:
where sales.month = time.month and time.month in (`01-03, `01-04)
As we see, the query is signicantly transformed, replacing all WHERE clause entries for the dimension table with a single subselect statement. These IN subqueries are ideal for the use of bitmap indexes because the bitmap can quickly scan the low-cardinality columns in the bitmap and produce a ROWID list of rows with matching values. This approach is far faster than the traditional method of joining the smallest reference table against the fact table and then joining each of the other reference tables against the intermediate table. The speed is a result of reducing the physical I/O. The indexes are read to gather the virtual table in memory. The fact table will not be accessed until the virtual index has everything it requires to go directly to the requested rows via the composite index on the fact table.
223
Starting with Oracle8i, the requirement for a concatenated index has changed, and the STAR hint requires bitmap indexes. The bitmap indexes can be joined more efciently than a concatenated index and they provide a faster result. As I have noted, the star query can be tricky to implement and careful consideration must be given to the proper placement of indexes. Each dimension table must have an index on the join key. In Oracle7 and Oracle8, the large fact table must have a composite index consisting of all of the join keys from all of the dimension tables, while in Oracle8i you need bitmap indexes on the fact table. In addition, the sequencing of the keys in the fact table composite index must be in the correct order or Oracle will not be able to use the index to service the query. Next lets examine alternative index structures and see how they can t into our physical design.
Bitmap Indexes It was a common misconception that bitmap indexes were only appropriate for columns with a small number of distinct values say, fewer than 50. Current research in Oracle8i has shown that bitmap indexes can substantially improve the speed of queries using columns with up to 1000 distinct values, because retrieval from a bitmap index is done in RAM and is almost always faster than using a traditional B-tree index. Most experienced DBAs will look for tables that contain columns with fewer than 1000 distinct values, build a bitmap index on these columns, and then see if the query is faster. Function-Based Indexes To use the alternative indexing structures, you must rst identify SQL statements that are using the built-in function (BIF). In the next example, we can search the v$sqlarea view to nd all SQL statements that are using the to_char BIF.
select sql_text from v$sqlarea -- or stats$sql_summary
224
Once identied, FBIs can be created to remove the FTSs and replace them with index-range scans.
225
Here you see the execution plan. Note the use of the CONCATENATION operator:
OPERATION --------------------------------------------------------------OPTIONS OBJECT_NAME POSITION ------------------------- ------------------------- ---------SELECT STATEMENT CONCATENATION 1 TABLE ACCESS BY INDEX ROWID EMP 1 INDEX RANGE SCAN JOB_IDX 1 TABLE ACCESS BY INDEX ROWID EMP 2 INDEX RANGE SCAN JOB_IDX 1
Now we change the execution plan by adding a first_rows hint and we see an entirely different execution plan:
OPERATION ---------------------------------------------------------------OPTIONS OBJECT_NAME POSITION ------------------------- --------------------------- ---------SELECT STATEMENT 1 INLIST ITERATOR 1 TABLE ACCESS BY INDEX ROWID EMP 1 INDEX RANGE SCAN JOB_IDX 1
Of course, this query can also be rewritten to utilize the union SQL operator. Here is an equivalent query:
select /*+ first_rows */ ename
226
Here you see the execution plan using the UNION-ALL table access method:
OPERATION -------------------------------------------------------------OPTIONS OBJECT_NAME POSITION ------------------------- ------------------------- ---------SELECT STATEMENT 6 SORT UNIQUE 1 UNION-ALL 1 TABLE ACCESS BY INDEX ROWID EMP 1 INDEX RANGE SCAN JOB_IDX 1 TABLE ACCESS BY INDEX ROWID EMP 2 INDEX RANGE SCAN JOB_IDX 1
Note that weve seen three alternative execution plans for the exact same result set. The point is that there are many opportunities to change the execution plan for queries that evaluate for multiple conditions. In most cases, you must actually time the queries to see which execution plan is fastest for your specic query.
227
Oracle9i to provide support for FBIs. With Oracle8, intelligence was added to the SQL optimizer to determine if a query might be resolved exclusively within an existing index. Oracles IOT structure is an excellent example of how Oracle is able to bypass table access whenever an index exists. In an IOT structure, all table data is carried inside the B-tree structure of the index, making the table redundant. Whenever the Oracle SQL optimizer detects that the query is serviceable without touching table rows, Oracle invokes a full-index scan and quickly reads every block of the index without touching the table itself. It is important to note that a full-index scan does not read the index nodes. Rather, a block-by-block scan is performed and all of the index nodes are quickly cached. Best of all, Oracle invokes multiblock read capability, invoking multiple processes to read the table.
228
It isnt always intuitive as to whether a fast full- index scan is the quickest way to service a query, because of all the variables involved. So most expert SQL tuners will manually time any query that meets the fast full-index scan criteria and see if the response time improves with the full-index scan.
BASICS OF FBIs
Prior to Oracle9i, full-index scans were possible only when the index was created without any null values. In other words, the index had to be created with a NOT NULL clause for Oracle to be able to use the index. This has been greatly enhanced in Oracle9i with support for index-only scans using FBIs. As a quick review, FBIs were an important enhancement in Oracle8, because they provided a mechanism for the virtual elimination of the unnecessary, long-table full scan. Because a FBI can exactly replicate any column in the WHERE clause of a query, Oracle will always be able to match the WHERE clause of a SQL query with an index. Here, I will use a simple example of a student table to illustrate how a full-index scan would work with a FBI:
create table student (student_name varchar2(40), date_of_birth date);
Using this table, create a concatenated FBI of all columns of the table. In this example, the functions are initcap (i.e., capitalize the rst letter of each word) and to_char (i.e., change a date to a character):
create index whole_student on student (initcap(student_name), to_char(date_of_birth,MM-DD-YY));
With the FBI dened, Oracle will recognize that any SQL statement that references these columns will be able to use the full-index scan. Here is an example of some SQL queries that match the FBI:
select * from student
229
Whenever a query asking where ename is NULL is issued, there would be no index and Oracle would perform an unnecessary large-table FTS.
Execution Plan ---------------------------------------------------------0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=6) 1 0 TABLE ACCESS (FULL) OF 'EMP' (Cost=1 Card=1 Bytes=6)
To get around this problem, we can create a FBI using the null value built-in SQL function to index only on the NULL columns:
-- create an FBI on ename column with NULL values create index emp_null_ename_idx on emp (nvl(ename,'null')) ; analyze index emp_null_ename_idx compute statistics;
Now we can use the index and gr eatly improve the speed of any queries that require access to the NULL columns. Note that we must make one of two changes:
230
1. Add a hint to force the index 2. Change the WHERE predicate to match the function Here is an example of using an index on NULL column values:
-- insert a NULL row insert into emp (empno) values (999); set autotrace traceonly explain; -- test the index access (hint forces index usage) select /*+ index(emp_null_ename_idx) */ ename from emp e where ename is NULL' ; -- test the index access (change predicate to use FBI) select /*+ index(emp_null_ename_idx) */ ename from emp e where nvl(ename,'null') = 'null'
231
The degree of parallelism on the index note that the parallel degree of the index is set independently; the index does not inherit the degree of parallelism of the table. The setting for optimizer_index_cost_adj this controls the propensity of the cost-based optimizer to favor full-index scans. The setting for db_file_multiblock_read_count this parameter factors in the cost of the full-index scan. The higher the value, the cheaper the full-index scan will appear. The presence of histograms on the index for skewed indexes, this helps the cost-based optimizer evaluate the number of rows returned by the query.
232
and suppliers. Each part has many suppliers and each supplier provides many parts (Figure 7.7). For this example, lets assume the database has 300 types of parts and the suppliers provide parts in all 50 states. So there are 50 distinct values in the STATE column and only 300 distinct values in the PART_TYPE column. Note in the listing below that we create an index on the inventory using columns contained in the SUPPLIER and PART tables. The idea behind a bitmap join index is to pr ejoin the low cardinality columns, making the overall join faster. It is well known that bitmap indexes can improve the performance of Oracle9i queries where the predicates involve the low cardinality columns, but this technique has never been employed in cases wher e the low cardinality columns reside in a foreign table. To create a bitmap join index, issue the following Oracle DDL. Note the inclusion of the FROM and WHERE clauses inside the CREATE INDEX syntax.
create bitmap index part_suppliers_state on inventory( parts.part_type, supplier.state)
233
Prior to Oracle9i, this SQL query would be serviced by a nested loop join or hash join of all three tables. With a bitmap join index, the index has prejoined the tables, and the query can quickly retrieve a ROWID list of matching table rows in all three tables. Note that this bitmap join index specied the join criteria for the three tables and created a bitmap index on the junction table ( INVENTORY) with the PART_TYPE and STATE keys. Oracle benchmarks claim that bitmap join indexes can run a query more than eight times faster than traditional indexing methods. However, this speed improvement is dependent upon many factors and the bitmap join is not a panacea. Some restrictions on using the bitmap join index include: The indexed columns must be of low cardinality usually with less than 300 distinct values.
234
The query must not have any references in the WHERE clause to data columns that are not contained in the index. The overhead when updating bitmap join indexes is substantial. For practical use, bitmap join indexes are dropped and rebuilt each evening during the daily batch load jobs. This means that bitmap join indexes are useful only for Oracle data warehouses that remain read-only during the processing day. Remember, bitmap join indexes can tremendously speed up specic data warehouse queries but at the expense of pr ejoining the tables at bitmap index creation time. You must also be concerned about high-volume updates. Bitmap indexes are notoriously slow to change when the table data changes and this can severely slow down INSERT and UPDATE DML against the target tables.
Oracle9i has introduced extremely sophisticated execution plan features that can dramatically improve query performance, but these features cannot be used automatically. The Oracle9i professionals challenge is to understand these new indexing features, analyze the trade-offs of additional indexing, and judge when the new features can be used to speed queries.
235
236
); end; /
CONCLUSION
This chapter has covered one of the most important ar eas of Oracle physical design, the creation and implementation of high-speed data access structures. The main point of this chapter is that the pr oper design of indexes can minimize the amount of work performed by Oracle at runtime and reduce overall response time for individual transactions.
7
INDEX
**char, 180, 184 .csv extensions, 142 /etc/rc le, 99 1NF, 11 2NF, 12 3NF, 8, 13, 28 64-bit processors, availability of, 56 ANALYZE TABLE command, 204 API, 68 Application programming interface. See API Array tables, 16 ASM, 17, 111, 113 ASSM, 15, 111, 113 benets of, 125 improvement of with DML operations, 126 internal freelist management with, 127 PCTUSED parameter and, 122 potential performance issues with, 130 SEGMENT SPACE MANAGEMENT AUTO parameter, 124 Atomic values, 9 Audit tables designing, 101 generating user activity reports from, 106 Automated memory management (AMM), 79 Automatic method generation, 192 Automatic segments space management. See ASSM Automatic space management, limitations of, 120 Automatic SQL query rewrite, 48 Automatic storage management. See ASM Automatic tuning parameters, 79 automatic_ipc parameter, 64 avg_row_len, 215
A
Abstract data types. See ADTs Abstraction, 41, 44 Access balancing, 132 keys, planning, 17 Ad hoc, update tools, 51 ADTs, 16, 39, 135, 156 capabilities of, 184 design with, 159 nesting, 162 repeating groups and, 170 tables, 159 Advanced replication, 69, 137 Advisory utility, 82 Aggregate class, 191 methods, 192 Aggregation, 46, 147, 175 designing objects for, 185 dynamic denition of criteria for, 8 all_rows mode, 220 Alter Database Flush Shared Pool, 81 alter table move, 123 ALTER TYPE statement, 183
B
B-tree indexes, 17, 129, 197 indexing alternatives to, 223
237
238
Base class methods, 192 Base classes, 191 Batch interfaces, 19 Batch tasks, 58 Behaviors coupling with data, 155 creation of, 4 Bill-of-Materials relationships. See BOM relationships Bimodal instances, 73 Binary large objects, 154 Bitmap freelists, 111, 122. See also ASSM value meanings, 128 Bitmap indexes, 198, 221, 223 Bitmap join index, 231 exclusions for, 234 Bitmap space management, 127 Bitmapped index merge, 200 Bitmapped indexes, 17, 111, 199 BOM relationships, 34 Booch, Grady, 1 break_poll_skip parameter, 64 Buffer busy waits, 117 reducing with ASSM, 131 Buffer monitoring, with Statspack, 94 Buffer pools, 89, 132 Buffering, spreadsheets and, 143 Business rules, control of with constraints, 49
C
CA-IDMS, 165 Cardinality, 214 Cell, denition of, 180 Central processing unit. See CPU chain_cnt, 118 CHAR, 156 CHECK constraint, 52 Chen diagram, 14 Chen, Peter, 13 Class hierarchy, 4, 25. See also IS-A relationship designing, 41 representing, 45 Classes, noninstantiated, 44 Cluster, 112 clustering_factor, 215 COCOMO, 22 CODASYL, 2 databases, 111 Codd, Dr. Edgar F., 7, 12, 26
D
Data behavior and, 15 buffers, designing, 87 cache advice, 91 coupling with behavior, 155 dictionary, 5, 15
Index
les, 15 ow diagram (DFD), 4 independence, 2 indexing multiple occurrences of, 9 internal buffers, 87 linked-list structures, 165 model extension capabilities, 155 modeling, 1, 8 modeling theory, 7 normalization, tenets of, 166 purging/archiving, 17 redundancy consequences of, 27 disadvantages of, 26 relationships physical design and, 25 recursive, 34 replicating, 37 sizing blocks, 112 structures, 154 time sensitivity of transfer, 136 types, 188 values, repeating groups of, 172 warehouse analysis, 4 denormalization in, 11 physical constructs, 7 Data buffer hit ratio. See DBHR Data Guard, 109 Data Manipulation Language. See DML Database addresses, 163 Database clusters, 112 Database deadlocks, 205 Database design logical, 7 object-oriented, 38 requirements validation, 18 Database fragmentation, 118 Database management system. See DBMS Databases pointer-based, 165 prerelational, 160 DATE, 156 Date, Chris, 12, 167 DB_BLOCK_BUFFERS, 112 db_block_buffers parameter, 62, 79, 195 db_block_size parameter, 79, 112, 215 db_cache_size parameter, 62 db_file_multiblock_read_count parameter, 79, 201, 227, 231
239
db_max_pct, 59 db_xK_cache_size parameter, 62 DB2, 207 DBA_HISTOGRAMS, 204 dba_indexes view, 215 dba_tables, 118 DBHR, 87, 89 calculating in Oracle releases, 93 design with, 92 display, 97 DBMS, 1, 9 dbms_describe package, pinning, 98 dbms_job utility, 59, 150 DBMS_MVIEW, 149 DBMS_OLAP package, 151 dbms_output package, pinning, 98 dbms_repair.rebuild_freelists, 123 dbms_shared_pool.keep, 97 dbms_shared_pool.unkeep, 97 dbms_standard package, pinning, 98 dbms_stats, 235 DDL, 109 Decision trees, 5 Declarative data access, 2 DEFAULT buffer pool, 89 DELETE, 116, 192, 206 Denormalization, 9 many-to-many data relationships, 32 massive, 37 materialized views and, 45 one-to-many data relationships, 29 schema, RI in, 53 tools and techniques, 14 DEREF, 172 Design methodologies, 1 Detail tables, 151 DFD, 18, 188 Dictionary cache, 196 Dictionary-managed tablespaces, 15 Dimension tables, 37 disable_oob parameter, 65 Disk allocation, 55 Disk design, 70 Disk I/O, 56 block sizes and, 112 reduction of, 202 with data buffers, 89 with index design, 210 Disk planning, 17 Dispatching priority, changing, 59
240
Distributed databases, 26 DML, 2, 101 concurrent, 129 improvement of ASSM with, 126 overhead for, 33 Dynamic reconguration, 73
E
E/R modeling, 5, 13, 41 creation of, 28 Effort multipliers, 22 Encapsulation, 160 End-user interface. See EUI Entity class, attributes within, 29 equi-join criteria, 221 estat-bstat utility, 99 EUI, requirements validation, 18 Event triggers, 101 Event waits, 82 Excel spreadsheets, 143 Execution phase of SQL compilation, 81 EXPLAIN PLAN, 206 Expressions, creation of indexes on, 200 EXTENDED ROWID, 164 Extent Control Header Block, 129 EXTENT MANAGEMENT LOCAL, 113 Extent map, 130 External bids, cost estimation using, 22 External tables, 135, 138 dening, 138 internals of, 143 security for, 145
G
Gane, Chris, 1 Generalization, object-oriented principle of, 41 global_dbname parameter, 65 Growth, estimating, 17
H
Hardware design issues, 55 requirements for Oracle server environment, 57 hash_area_size parameter, 76 HHWM, 129 Hierarchical databases, 165 improvements of relational databases over, 1 nesting tables and, 182 High high water mark. See HHWM High memory, addressing, 59 High water mark. See HWM Hints, 48, 203, 223, 227 Histograms, 205, 231 column, 213 design for automatic creation, 235 Hit ratios, 81, 93 Host variables, 80 Hot refresh, 46
F
Fact tables, 37, 151, 221 Failover options, designing, 109 Fast full-index scan, 227 FBIs, 17, 198, 200, 223, 228 invoking full-index scan with, 230 Feasibility study, 2 File structures, evolution of, 113 First normal form. See 1NF first_rows mode, 220 Flat les, 154 accessing, 139 Flexibility, 160 FOREIGN KEY constraint, 52 Foreign keys, 14, 165 constraints, 51 redundant, 26
Index
HWM, 129 determining, 75 IS-A relationship, 25, 41 difference from one-to-many data relationships, 44 ITERATOR, 224
241
I
I/O. See disk I/O IDMS, 160, 166 Implementation, 3 IMS, 14, 160 IN clause, 221 index usage for queries with, 224 Incremental refresh, manual fast, 150 Index access methods evaluating, 202 high-speed design, 208 Index clustering, ASSM inuence on, 132 Index parallelism, 208 Index range scan, 202 Index tablespace, block size option of, 210 Index-only scan, criteria for invoking, 230 Index-organized tables. See IOT index_ffs hint, 203, 227 Indexes design of, 198 constraints, 216 Oracle optimizer and, 213 Oracle choice of, 219 Indexing algorithms, 206 Indexing structures, 16 Information hiding, 191 Inheritance, 38, 44, 156 support for, 7 Init.ora parameters, 62, 65, 79, 112. See also specic parameters Initcap, 228 INITIAL, 124 INLIST, 224 Input functional validation, 19 Input requirements validation, 18 INSERT, 116, 192, 234 Inserts, distribution of, 224 Instance, 16 Instantaneous rebuilding, 46 INT, 188 Interface routines, 68 Internal data buffers, 87 Internal functions, creation of indexes on, 200 Internal machine code, assignment of tasks by, 57 IOT, 16, 198, 201
J
Java_pool_size parameter, 62 Join key, indexing on, 223 JOIN operations, 165 Joint application development, 1
K
KEEP buffer pool, 89, 96 Key dependencies, partial, 12 Kim, Dr. Wong, 160
L
Lab rates, 22 Latches, 112 Least recently used algorithm (LRU), 56 LHWM, 129 Library cache, 80 usage measurement, 81 LIKE, 206 Linked-list data structures, 165 listener.ora, 63, 65 LMTs, 15, 111, 113 CREATE TABLESPACE statement for, 125 Load balancing analysis, 58 Local connections, automatic_ipc parameter and, 64 local_listener parameter, 67 Locally managed tablespaces. See LMTs Location clause, specication of, 217 Locators, 158 Lock pools, 132 Logical corruption, 51 Logical data modeling bridging with physical models, 15 converting to physical data structures, 1 Logical database design, 7 Logical design, 3 Logon detail reports, 107 Logon/logoff triggers designing, 102, 104 tracking user activity with, 101 Lookup table, 197 Low high water mark. See LHWM
242
M
Manual preaggregation, 14 Many-to-many relationships, 25, 165 denormalization of, 32 expanding recursive, 35 Materialized views, 14, 27, 135 creating, 151 denormalization and, 45 design with, 147 prejoining tables using, 11 prerequisites for using, 148 refreshing, 149 snapshots and, 148 tips for using, 151 updatable, 137 Mathematical models, estimating costs using, 21 Megatables, 43 Member methods, 15, 135 Memory -bound servers, 62 formulas for RAM allocation, 74 nonswappable, 60 page scan rate, 60 windows, 59 method_opt, 235 Methods, 38, 192 designing with, 187 Minispec, 5 Modeling, real-world objects, 155 Module size, 22 Monitoring data buffer hits, 87 mts_dispatchers parameter, 65, 68 MTU, 65 Multiblock read capability, 227 Multicolumn indexes, 218 Multidimensional data representation, 11 Multidimensional pointers, 182 Multimaster, 69 replication, 136 Multiple inheritance, 44 Multiple representations, 53 Multiprocessor congurations, symmetric, 58 Multivalued indexes, 218 Multiway table join, 213 MySQL, 1
O
Object extensions, performance of, 158 Object ID. See OID Object orientation, 156 Object storage parameters guidelines for setting, 119 performance and, 116 Object tables, 135, 164 Object-oriented constructs, 53 database design, 38 generalization principle, 41 physical features, 7 table structures, 154 Object/relational model, 191 Objectivity, 180 Objects, free list management for, 117 ODBC, network performance and, 68 OID, 10, 135, 157. See also pointers dereferencing, 172 design with, 163 embedding, 191 implications of for design, 184 navigating with, 166 repeating groups of, 170
Index
OLTP, 7 bitmap join indexes in, 231 block sizes for, 112 ON COMMIT refreshing, 150 ON DEMAND, 149 One-to-many relationships, 25, 165 denormalizing, 29 difference from IS-A relationships, 44 modeling of, 182 One-to-one relationships, 25 Online transaction processing. See OLTP ONTOS, 180 Open Database Connectivity. See ODBC optimizer_index_cost_adj, 231 optimizer_mode parameter, 79, 230 ORA-4031 error, 196 Oracle, 1, 9 10i, system-level trigger functionality in, 109 64-bit version of, 56 8.0 FBIs, 228 stats$sesstat table, 94 8i buffer pools, 89 init.ora parameters, 79 9i, 55 automatic fast refresh in, 150 data model extensions, 14 external tables, 138 fast full index scan on FBIs, 231 method planning, 194 multidimensional pointers in, 182 pga_aggregate_target parameter, 59 RAC, 109 self-tuning in, 73 SGA parameters, 79 space management, 128 v$shared_pool_advice, 82 call interface (OCI), 69 Concurrent Manager, 59 Data Guard, 109 Database 10g, 17, 73 SGA parameters, 79 designing network infrastructure for, 62 disk design, 70 event waits, 82 Express engine, 11 full-index scans, 226 hints, 48 instance design, 73 methods, designing with, 187 Net, 62 network design, 55, 63 object extensions, performance of, 158 object orientation and, 156 object structures, 154 optimizer, 220 index design and, 213 Parallel Query, 131, 204 parallel server, 104 parsing algorithm, 80 partitioning, 153 physical design activities, 17 locating aws, 23 replication design, 69 sizing data blocks, 112 streams, 109 ORDER BY clause, 207 order_form, 185 Out of Bounds Breaks (OOB), 65 Output functional validation, 20 Overloading, 194 Overloads, 58 Overnormalization, dangers of, 28
243
P
Package construct, 195 Package pinning, 97 automatic repinning, 99 Page-in operations, 59, 62 Page-out operations, 62 Paging, 60 Parallel Query, 131, 204 parallel_index hint, 203, 227 Parallelism, 231 Parse phase of SQL compilation, 81 Parsing algorithm, 80 Partial key dependencies, 12 Partial text match, 49 Partitioned indexes, 17 Partitioned structures, design for, 153 PCTFREE, 113, 120 segment storage and, 114 setting, 116 PCTUSED, 113, 122 segment storage and, 115 setting, 116 Performance, 26 possible causes of problems, 23 Performance-tuning, 55
244
Permutation schemes, 224 Persistent area, 196 PGA RAM regions, 59, 75 advice, 90 determining optimum size of, 76 reservation of for database connections, 74 script for computing, 77 pga_aggregate_target parameter, 79 Physical data structures bridging with logical models, 15 conversion from logical data modeling, 1 Physical design, 1, 3 data relationships and, 25 locating aws in, 23 redundancy and, 26 requirements validation, 18 Physical row-ordering, index design and, 215 pi values, 60 Pinning packages, 97, 196 automatic repinning, 99 Pins, 82 PL/SQL, low code reusability of, 38 po values, 60 Pointer-based tables, 135 Pointers, 10, 188 dereferencing, 156, 158 Polling overhead, 64 Polymorphism, 38, 44, 156 support for, 7 Pool thrashing, 57, 80 Pooling, 67 Preaggregation, 14 Precalculated summaries, 47 Prefetch, 201 Prejoining tables, 47 using materialized views, 11 Prerelational databases, 160 PRIMARY KEY constraint, 52 Primary key constraint, 216 Primary key indexes, 224 Primary-key logs, 150 Private SQL areas, 196 Procedural Language/Structured Query Language. See PL/SQL Process logic specications, 5, 18. See also structured specication Processing functional validation, 20 Processing loads, 73
Q
Quality performance monitors, 70 Query index usage for with IN conditions, 224 optimizer, 48, 147 rewrite facility, 147 invoking, 149 speed, improvement of by aggregation, 47 query_rewrite_integrity, 151 queuesize parameter, 67
R
RAC, 109 ASSM and, 131 RAM data caching, 56 design for Oracle server, 55, 59 reduction of demand for, 62 reserving for database connections, 74 System Global Area region of, 16 Random-access memory. See RAM Reactive reconguration, 73 Real application clusters (RACs), 104 Real-world objects, 38 modeling, 15, 155 representation of, 185 Rebuilding, instantaneous, 46 Reconguration, 73 Recursive many-to-many relationships, 25 RECYCLE buffer pool, 89 Redundancy, 8 physical design and, 26 Referential integrity. See RI Refresh, manual complete, 150 REFRESH FAST, 149 Refresh interval, 46 Relational databases, physical design, 1 Relational table joins, 10 Relational tables, 16
Index
Relink threshold, 122 Reloads, 82 Repeating groups ADTs and, 170 advantages of, 168 determining when to use, 170 disadvantages of, 169 Repeating values, 167 support for, 9 Replicated tables, 135 Replication techniques, 27, 69 design of, 132 Requirements data mapping, 18 Requirements validation, 18 Reusability, 160 Reusable SQL statements, 81 Reverse-key indexes, SQL performance and, 224 RI, 49 denormalization schema and, 53 ROLLUP parameter, 139 Row chaining, design to control, 118 Row expansion, control of by PCTFREE parameter, 118 Row migration, design to control, 118 Row-ordering, index design and, 215 ROWID, 163, 174, 202, 233 logs, 150 Rule-based optimizer, 204, 220 run_queue, 58 Runtime area, 196
245
S
Sarson, Trish, 1 SCOPE clause, 188 Second normal form. See 2NF SEGMENT SPACE MANAGEMENT, 125 SEGMENT SPACE MANAGEMENT AUTO, 113, 124 Segment storage, design for, 114 SELECT, 116, 162, 203, 227 Selectivity, 204, 216, 218 Sequential prefetch, 201 Server design, 55 planning the environment, 57 Server architecture, planning, 55 Session data unit (SDU), 65 Session information area of PGA, 75
SGA measuring consumption with estatbstat utility, 99 memory management, 73 parameter components, 79 pinning packages in, 97 stored procedures and triggers, 195 SGA regions, 59 design for faster access, 56 prevention of paging, 60 sga_max_size parameter, 62, 79 sga_target parameter, 80 Shared pool advisory utility, 82, 90 designing, 80 Shared SQL areas, 196 SHARED_MAGIC, 59 shared_pool_size parameter, 62, 79, 99, 195 hit ratios and, 81 SHMMAX, 59 SHMMIN, 59 Size, estimating, 17 Skew, 214 skewonly, 235 Smalltalk programming, 156 Snapshots, 37 materialized views and, 148 Sort area region of PGA, 75 sort_area_size parameter, 76, 79 Space management, 128, 132 Sparse tables, 123 Speed factors in index creation, 208 Spreadsheets, 142 limitations of comma-delimited les, 145 SQL, 2 automatic query rewrite, 48 connecting to databases with ODBC, 68 low code reusability of, 38 optimizer, 201, 204, 221, 230 Oracle library cache, 80 overhead from table joins, 28 performance and reverse-key indexes, 224 performance ramications of using object extensions, 158 Plus software, 51 private areas, 196 query rewrite, invoking, 149 shared areas, 196 UNION operator, 169
246
sqlnet.ora, 63 Stack space area of PGA, 75 Standalone methods, 192 Standard package, pinning, 98 Standard relational tables, 16 Standby Database, 109 STAR hint, 223 Star query, 221, 223 Star schema design, 37 index design for, 221 stats$, 94 user_log audit table, 102 stats$buffer_pool_statistics, 94 Statspack, 58, 90 buffer monitoring with, 94 using for DBHR, 94 Storage parameters, performance and, 116 Store tables, 180 Stored procedures, Oracle tables and, 194 Streams, 109 Structured English, 5 Structured Query Language. See SQL Structured specication, 4 Submethods, 187 SUBSTR, 206, 220 Subtables, 43, 180 Summary tables, manual creation of, 47 Swap disk conguration of, 59 design for Oracle server, 60 Symmetric multiprocessor congurations, 58 Synthetic keys, 224 SYS context function, 102, 104 sys.standard, pinning, 98 System errors, 109 System Global Area (SGA), 16 System privileges, for materialized view users, 149 System-level trigger functionality, 109 Systems analysis, 1, 3 physical database design and, 4 Systems development, physical design, 2
U
UML, 1 UNDO, block sizes and, 112 Unied Modeling Language. See UML UNION operator, 169, 224 UNION-ALL operator, 226 UNIQUE constraint, 52 Uniqueness of values, 204 UniSQL, 9, 37, 160 databases, 166 UNIX servers, 58 scan rate, 60
T
TABLE ACCESS FULL, 224 Table denormalization, 8 Table joins, 231 removing unnecessary, 155 Table normalization, 102 Table replication, design, 135
Index
Unnormalized relations, 9 Updatable materialized views, 137 UPDATE, 116, 192, 206 UPDATE DML, 234 Update frequency, 27 Update-anywhere replication, 69 Updateable-snapshot, 69 UPPER, 206, 220 User activity, 109 logon detail reports, 107 logon triggers use to track, 101 reports, 106 User-written functions, creation of indexes on, 200 utl_file utility, 138 v$shared_pool_advice utility, 90 VARCHAR data types, 119, 156, 188 VARCHAR2, 164 VARRAY tables, 9, 154, 159 design with, 166 repeating groups and, 172 Varying array tables. See VARRAY tables Virtual memory, 61 vmstat, 61 vmstat_run_queue, 58
247
W
Wait events, 82 WHERE clause, 201, 203, 213, 220, 227 in index creation syntax, 231
V
v$buffer_pool_statistics view, 93 v$db_cache_advice utility, 90 using, 91 v$LIBRARYCACHE table, 81 v$pga_target_advice utility, 90 v$shared_pool_advice, 82, 87
X
x$kcbcbh, 90
Y
Yourdon, Ed, 1