Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Bitmap Join Indexes

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 15
At a glance
Powered by AI
Bitmap join indexes and materialized views can both improve query performance in data warehouses. Bitmap join indexes store the results of join queries in a compressed bitmap format to reduce processing.

Bitmap join indexes extend bitmap indexes to support join queries. They store the rowids of joined rows in other tables to allow retrieving joined data without accessing the join tables directly.

Parallel DML is only supported on the fact table. Only one table can be updated concurrently. The columns must be from dimension tables with unique constraints.

Bitmap Join Indexes

In Oracle8i performance improvements were made using materialized views to store the resulting
rows of queries. The benefits of this mechanism are still relevant, but a certain subset of the
queries used in a data warehouse may benefit from the use of Bitmap Join Indexes.
● How It Works
● Creation
● Restrictions
How It Works
In a Bitmap Index, each distinct value for the specified column is associated with a bitmap where
each bit represents a row in the table. A '1' means that row contains that value, a '0' means it
doesn't.

Bitmap Join Indexes extend this concept such that the index contains the data to support the join
query, allowing the query to retrieve the data from the index rather than referencing the join
tables. Since the information is compressed into a bitmap, the size of the resulting structure is
significantly smaller than the corresponding materialized view.

Creation
The index is created with reference to the columns in the joined tables that will be used to support
the query. In the following example an index is created where the SALES table is joined to the
CUSTOMERS table:
CREATE BITMAP INDEX cust_sales_bji
ON sales(customers.state)
FROM sales, customers
WHERE sales.cust_id = customers.cust_id;
Since the CUSTOMERS.STATE column is referenced in the ON clause of the index, queries on
the SALES table that join to the CUSTOMERS table to retrieve the STATE column can do so
without referencing the CUSTOMERS table. Instead the data is read from the bitmap join index:
SELECT SUM(sales.dollar_amount)
FROM sales,
customer
WHERE sales.cust_id = customer.cust_id
AND customer.state = 'California';
When dealing with large datasets, this reduction in processing can be substantial.

Restrictions
Bitmap Join Indexes have the following restrictions:
● Parallel DML is currently only supported on the fact table. Parallel DML on one of the
participating dimension tables will mark the index as unusable.
● Only one table can be updated concurrently by different transactions when using the
bitmap join index.
● No table can appear twice in the join.
● You cannot create a bitmap join index on an index-organized table or a temporary table.
● The columns in the index must all be columns of the dimension tables.
● The dimension table join columns must be either primary key columns or have unique
constraints.
● If a dimension table has composite primary key, each column in the primary key must be
part of the join.
Hope this helps. Regards Tim...
In a data warehouse, B-tree indexes should be used only for unique columns or
other columns with very high cardinalities (that is, columns that are almost
unique). The majority of indexes in a data warehouse should be bitmap
indexes.
Bitmap indexes are most effective for queries that contain multiple conditions
in the
WHERE clause. Rows that satisfy some, but not all, conditions are filtered out
before
the table itself is accessed. This improves response time, often dramatically
Bitmap indexes on partitioned tables must be local indexes
Bitmap join indexes -
A bitmap join index is a space efficient way of reducing the volume of data
that must be joined by performing restrictions in advance. For each value in a
column of a table, a bitmap join index stores the rowids of corresponding rows
in one or more other tables. In a data warehousing environment, the join
condition is an equi-inner join between the primary key column or columns of
the dimension tables and the foreign key column or columns in the fact table.
Specify NOLOGGING clauses on the create index statement.
Bitmap join indexes for snow flake schema -
CREATE BITMAP INDEX sales_c_gender_p_cat_bjix
ON sales(customers.cust_gender, products.prod_category)
FROM sales, customers, products
WHERE sales.cust_id = customers.cust_id
AND sales.prod_id = products.prod_id
LOCAL NOLOGGING;
● Parallel DML is currently only supported on the fact table. Parallel DML
on one of the participating dimension tables will mark the index as
unusable.
● Only one table can be updated concurrently by different transactions
when using the bitmap join index.
● The columns in the index must all be columns of the dimension tables.
B-tree indexes are most commonly used in a data warehouse to index unique or
near-unique keys. In many cases, it may not be necessary to index these
columns in a data warehouse, because unique constraints can be maintained
without an index, and because typical data warehouse queries may not work
better with such indexes. Bitmap indexes should be more common than B-tree
indexes in most data warehouse environments.
References -
Oracle datawarehousing guide.pdf
● If a B*tree index is not an efficient mechanism for accessing data, it is
unlikely to become more efficient simply because you convert it to a
bitmap index.
● Bitmap indexes can usually be built quickly, and tend to be surprisingly
small.
● The size of the bitmap index varies dramatically with the distribution of
the data.
● Bitmap indexes are typically useful only for queries that can use several
such indexes at once.
● Updates to bitmapped columns, and general insertion/deletion of data
can cause serious lock contention.

Alternatives to Oracle Fine Grained Access Control


Bitmap join indexes

Donald K. Burleson

Oracle has introduced a new method to create speed join queries


against very large data warehouse tables. This new method is called
the bitmap join index, and this new table access method required the
creation of an index that performs the join at index creation time and
creates a bitmap index of the keys that are used in the join.
For our example, we will use a many-to-many relationship where we
have parts and suppliers. Each part has many suppliers and each
supplier provides many parts
In this example, the database has 200 types of parts and the suppliers
provide parts in all 50 states. The idea behind a bitmap join index is to
pre-join the low cardinality columns together, thereby making the
overall join faster:
To create a bitmap join index we issue the following SQL. Note the
inclusion of the FROM and WHERE clauses inside the CREATE INDEX
syntax.
create bitmap index
part_suppliers_state
on
inventory( parts.part_type, supplier.state)
from
inventory i,
parts p,
supplier s
where
i.part_id = p.part_id
and
i.supplier_id = p.part_id;

While b-tree indexes are used in the standard junction records, we can
improve the performance of Oracle queries where the predicates
involve the low cardinality columns. For example, look at the query
below where we want a list of all suppliers of pistons in North Carolina:
select
supplier_name
from
parts
natural join
inventory
natural join
suppliers
where
part_type = ‘piston’
and
state = ‘nc’
;

Prior to Oracle, this query would require a nested loop join or hash
join of all three tables. In Oracle, we can pre-join these tables based
on the low cardinality columns.
For queries that have additional criteria in the WHERE clause that does
not appear in the bitmap join index, Oracle will be unable to use this
index to service the query.
While Oracle markets this new feature with great fanfare, the bitmap
join index is only useful for table joins that involve low-cardinality
columns (e.g. columns with less than 300 distinct values). Bitmap join
indexes are also not useful for OLTP databases because of the high
overhead associated with updating bitmap indexes.
Oracle claims that this indexing method results in more than 8x
improvement in table joins in cases where all of the query data resides
inside the index. However, this claim is dependent upon many factors,
and the bitmap join is not a panacea. In many cases the traditional
hash join or nested loop join may out-perform a bitmap join. Some
limitations of the bitmap join index join include:
● The indexed columns must be of low cardinality – usually with
less than 300 distinct values
● The query must not have any references in the WHERE clause to
data columns that are not contained in the index.
● The overhead when updating bitmap join indexes is substantial.
For practical use, bitmap join indexes are dropped and re-built
each evening about the daily batch load jobs. Hence bitmap join
indexes are only useful for Oracle data warehouses that remain
read-only during the processing day.
In sum, bitmap join indexes will tremendously speed-up specific data
warehouse queries, but at the expense of pre-joining the tables at
bitmap index creation time.
If you like Oracle tuning, you might enjoy my latest book “Oracle
Tuning: The Definitive Reference” by Rampant TechPress. It’s only
$41.95 (I don’t think it is right to charge a fortune for books!) and you
can buy it right now at this link:
http://www.rampant-books.com/book_2003_1_Oracle_sga.htm
Managing Clusters
This chapter describes aspects of managing clusters. It contains the following topics
relating to the management of indexed clusters, clustered tables, and cluster indexes:
● Guidelines for Managing Clusters
● Creating Clusters
● Altering Clusters
● Dropping Clusters
● Viewing Information About Clusters
See Also:
○ Chapter 19, "Managing Hash Clusters" for a description of
another type of cluster: a hash cluster
○ Chapter 14, "Managing Space for Schema Objects" is
recommended reading before attempting tasks described in this
chapter
Guidelines for Managing Clusters
A cluster provides an optional method of storing table data. A cluster is made up of a
group of tables that share the same data blocks. The tables are grouped together because
they share common columns and are often used together. For example, the emp and dept
table share the deptno column. When you cluster the emp and dept tables (see Figure 18-
1), Oracle physically stores all rows for each department from both the emp and dept
tables in the same data blocks.
Because clusters store related rows of different tables together in the same data blocks,
properly used clusters offer two primary benefits:
● Disk I/O is reduced and access time improves for joins of clustered tables.
● The cluster key is the column, or group of columns, that the clustered tables have
in common. You specify the columns of the cluster key when creating the cluster.
You subsequently specify the same columns when creating every table added to
the cluster. Each cluster key value is stored only once each in the cluster and the
cluster index, no matter how many rows of different tables contain the value.
Therefore, less storage might be required to store related table and index data in a
cluster than is necessary in non-clustered table format. For example, in Figure 18-
1, notice how each cluster key (each deptno) is stored just once for many rows
that contain the same value in both the emp and dept tables.
After creating a cluster, you can create tables in the cluster. However, before any rows
can be inserted into the clustered tables, a cluster index must be created. Using clusters
does not affect the creation of additional indexes on the clustered tables; they can be
created and dropped as usual.
You should not use clusters for tables that are frequently accessed individually.

Figure 18-1 Clustered Table Data


Text description of the illustration admin021.gif

The following sections describe guidelines to consider when managing clusters, and
contains the following topics:
● Choose Appropriate Tables for the Cluster
● Choose Appropriate Columns for the Cluster Key
● Specify Data Block Space Use
● Specify the Space Required by an Average Cluster Key and Its Associated Rows
● Specify the Location of Each Cluster and Cluster Index Rows
● Estimate Cluster Size and Set Storage Parameters
See Also:
○ Oracle9i Database Concepts for more information about
clusters
○ Oracle9i Database Performance Tuning Guide and Reference
for guidelines on when to use clusters

Choose Appropriate Tables for the Cluster


Use clusters for tables for which the following conditions are true:
● The tables are primarily queried--that is, tables that are not predominantly inserted
into or updated.
● Records from the tables are frequently queried together or joined.

Choose Appropriate Columns for the Cluster Key


Choose cluster key columns carefully. If multiple columns are used in queries that join
the tables, make the cluster key a composite key. In general, the characteristics that
indicate a good cluster index are the same as those for any index. For information about
characteristics of a good index, see "Guidelines for Managing Indexes".
A good cluster key has enough unique values so that the group of rows corresponding to
each key value fills approximately one data block. Having too few rows for each cluster
key value can waste space and result in negligible performance gains. Cluster keys that
are so specific that only a few rows share a common value can cause wasted space in
blocks, unless a small SIZE was specified at cluster creation time (see "Specify the Space
Required by an Average Cluster Key and Its Associated Rows").
Too many rows for each cluster key value can cause extra searching to find rows for that
key. Cluster keys on values that are too general (for example, male and female) result in
excessive searching and can result in worse performance than with no clustering.
A cluster index cannot be unique or include a column defined as long.

Specify Data Block Space Use


By specifying the PCTFREE and PCTUSED parameters during the creation of a cluster, you
can affect the space utilization and amount of space reserved for updates to the current
rows in the data blocks of a cluster's data segment. PCTFREE and PCTUSED parameters
specified for tables created in a cluster are ignored; clustered tables automatically use the
settings specified for the cluster.
See Also:
"Managing Space in Data Blocks" for information about setting the
PCTFREE and PCTUSED parameters

Specify the Space Required by an Average Cluster Key and Its


Associated Rows
The CREATE CLUSTER statement has an optional argument, SIZE, which is the estimated
number of bytes required by an average cluster key and its associated rows. Oracle uses
the SIZE parameter when performing the following tasks:
● Estimating the number of cluster keys (and associated rows) that can fit in a
clustered data block
● Limiting the number of cluster keys placed in a clustered data block. This
maximizes the storage efficiency of keys within a cluster.
SIZE does not limit the space that can be used by a given cluster key. For example, if
SIZE is set such that two cluster keys can fit in one data block, any amount of the
available data block space can still be used by either of the cluster keys.
By default, Oracle stores only one cluster key and its associated rows in each data block
of the cluster's data segment. Although block size can vary from one operating system to
the next, the rule of one key for each block is maintained as clustered tables are imported
to other databases on other machines.
If all the rows for a given cluster key value cannot fit in one block, the blocks are chained
together to speed access to all the values with the given key. The cluster index points to
the beginning of the chain of blocks, each of which contains the cluster key value and
associated rows. If the cluster SIZE is such that more than one key fits in a block, blocks
can belong to more than one chain.

Specify the Location of Each Cluster and Cluster Index Rows


If you have the proper privileges and tablespace quota, you can create a new cluster and
the associated cluster index in any tablespace that is currently online. Always specify the
TABLESPACE option in a CREATE CLUSTER/INDEX statement to identify the tablespace to
store the new cluster or index.
The cluster and its cluster index can be created in different tablespaces. In fact, creating a
cluster and its index in different tablespaces that are stored on different storage devices
allows table data and index data to be retrieved simultaneously with minimal disk
contention.

Estimate Cluster Size and Set Storage Parameters


The following are benefits of estimating a cluster's size before creating it:
● You can use the combined estimated size of clusters, along with estimates for
indexes, rollback segments, and redo log files, to determine the amount of disk
space that is required to hold an intended database. From these estimates, you can
make correct hardware purchases and other decisions.
● You can use the estimated size of an individual cluster to better manage the disk
space that the cluster will use. When a cluster is created, you can set appropriate
storage parameters and improve I/O performance of applications that use the
cluster.
Whether or not you estimate table size before creation, you can explicitly set storage
parameters when creating each non-clustered table. Any storage parameter that you do
not explicitly set when creating or subsequently altering a table automatically uses the
corresponding default storage parameter set for the tablespace in which the table resides.
Clustered tables also automatically use the storage parameters of the cluster.
Creating Clusters
To create a cluster in your schema, you must have the CREATE CLUSTER system privilege
and a quota for the tablespace intended to contain the cluster or the UNLIMITED
TABLESPACE system privilege.
To create a cluster in another user's schema you must have the CREATE ANY CLUSTER
system privilege, and the owner must have a quota for the tablespace intended to contain
the cluster or the UNLIMITED TABLESPACE system privilege.
You create a cluster using the CREATE CLUSTER statement. The following statement
creates a cluster named emp_dept, which stores the emp and dept tables, clustered by the
deptno column:
CREATE CLUSTER emp_dept (deptno NUMBER(3))
PCTUSED 80
PCTFREE 5
SIZE 600
TABLESPACE users
STORAGE (INITIAL 200K
NEXT 300K
MINEXTENTS 2
MAXEXTENTS 20
PCTINCREASE 33);

If no INDEX keyword is specified, as is true in this example, an index cluster is created by


default. You can also create a HASH cluster, when hash parameters (HASHKEYS, HASH IS,
or SINGLE TABLE HASHKEYS) are specified. Hash clusters are described in Chapter 19,
"Managing Hash Clusters".
See Also:
Oracle9i SQL Reference for a more complete description of syntax,
restrictions, and authorizations required for the SQL statements
presented in this chapter

Creating Clustered Tables


To create a table in a cluster, you must have either the CREATE TABLE or CREATE ANY
TABLE system privilege. You do not need a tablespace quota or the UNLIMITED
TABLESPACE system privilege to create a table in a cluster.
You create a table in a cluster using the CREATE TABLE statement with the CLUSTER
option. The emp and dept tables can be created in the emp_dept cluster using the
following statements:
CREATE TABLE emp (
empno NUMBER(5) PRIMARY KEY,
ename VARCHAR2(15) NOT NULL,
. . .
deptno NUMBER(3) REFERENCES dept)
CLUSTER emp_dept (deptno);

CREATE TABLE dept (


deptno NUMBER(3) PRIMARY KEY, . . . )
CLUSTER emp_dept (deptno);
Note:
You can specify the schema for a clustered table in the CREATE TABLE
statement. A clustered table can be in a different schema than the
schema containing the cluster. Also, the names of the columns are not
required to match, but their structure must match.

Creating Cluster Indexes


To create a cluster index, one of the following conditions must be true:
● Your schema contains the cluster.
● You have the CREATE ANY INDEX system privilege.
In either case, you must also have either a quota for the tablespace intended to contain the
cluster index, or the UNLIMITED TABLESPACE system privilege.
A cluster index must be created before any rows can be inserted into any clustered table.
The following statement creates a cluster index for the emp_dept cluster:
CREATE INDEX emp_dept_index
ON CLUSTER emp_dept
INITRANS 2
MAXTRANS 5
TABLESPACE users
STORAGE (INITIAL 50K
NEXT 50K
MINEXTENTS 2
MAXEXTENTS 10
PCTINCREASE 33)
PCTFREE 5;

The cluster index clause (ON CLUSTER) identifies the cluster, emp_dept, for which the
cluster index is being created. The statement also explicitly specifies several storage
settings for the cluster and cluster index.
Altering Clusters
To alter a cluster, your schema must contain the cluster or you must have the ALTER ANY
CLUSTER system privilege. You can alter an existing cluster to change the following
settings:
● Physical attributes (PCTFREE, PCTUSED, INITRANS, MAXTRANS, and storage
characteristics)
● The average amount of space required to store all the rows for a cluster key value
(SIZE)
● The default degree of parallelism
Additionally, you can explicitly allocate a new extent for the cluster, or deallocate any
unused extents at the end of the cluster. Oracle dynamically allocates additional extents
for the data segment of a cluster as required. In some circumstances, however, you might
want to explicitly allocate an additional extent for a cluster. For example, when using
Oracle9i Real Application Clusters, you can allocate an extent of a cluster explicitly for a
specific instance. You allocate a new extent for a cluster using the ALTER CLUSTER
statement with the ALLOCATE EXTENT clause.
When you alter data block space usage parameters (PCTFREE and PCTUSED) or the cluster
size parameter (SIZE) of a cluster, the new settings apply to all data blocks used by the
cluster, including blocks already allocated and blocks subsequently allocated for the
cluster. Blocks already allocated for the table are reorganized when necessary (not
immediately).
When you alter the transaction entry settings (INITRANS and MAXTRANS) of a cluster, a
new setting for INITRANS applies only to data blocks subsequently allocated for the
cluster, while a new setting for MAXTRANS applies to all blocks (already and subsequently
allocated blocks) of a cluster.
The storage parameters INITIAL and MINEXTENTS cannot be altered. All new settings for
the other storage parameters affect only extents subsequently allocated for the cluster.
To alter a cluster, use the ALTER CLUSTER statement. The following statement alters the
emp_dept cluster:
ALTER CLUSTER emp_dept
PCTFREE 30
PCTUSED 60;

See Also:
Oracle9i Real Application Clusters Administration for specific uses of
the ALTER CLUSTER statement in an Oracle Real Application Clusters
environment

Altering Clustered Tables


You can alter clustered tables using the ALTER TABLE statement. However, any data
block space parameters, transaction entry parameters, or storage parameters you set in an
ALTER TABLE statement for a clustered table generate an error message (ORA-01771,
illegal option for a clustered table). Oracle uses the parameters of the cluster
for all clustered tables. Therefore, you can use the ALTER TABLE statement only to add or
modify columns, drop non-cluster key columns, or add, drop, enable, or disable integrity
constraints or triggers for a clustered table. For information about altering tables, see
"Altering Tables".

Altering Cluster Indexes


You alter cluster indexes exactly as you do other indexes. See "Altering Indexes".

Note:
When estimating the size of cluster indexes, remember that the index is
on each cluster key, not the actual rows. Therefore, each key appears
only once in the index.

Dropping Clusters
A cluster can be dropped if the tables within the cluster are no longer necessary. When a
cluster is dropped, so are the tables within the cluster and the corresponding cluster index.
All extents belonging to both the cluster's data segment and the index segment of the
cluster index are returned to the containing tablespace and become available for other
segments within the tablespace.
To drop a cluster that contains no tables, and its cluster index, use the DROP CLUSTER
statement. For example, the following statement drops the empty cluster named
emp_dept:
DROP CLUSTER emp_dept;

If the cluster contains one or more clustered tables and you intend to drop the tables as
well, add the INCLUDING TABLES option of the DROP CLUSTER statement, as follows:
DROP CLUSTER emp_dept INCLUDING TABLES;

If the INCLUDING TABLES option is not included and the cluster contains tables, an error
is returned.
If one or more tables in a cluster contain primary or unique keys that are referenced by
FOREIGN KEY constraints of tables outside the cluster, the cluster cannot be dropped
unless the dependent FOREIGN KEY constraints are also dropped. This can be easily done
using the CASCADE CONSTRAINTS option of the DROP CLUSTER statement, as shown in the
following example:
DROP CLUSTER emp_dept INCLUDING TABLES CASCADE CONSTRAINTS;

Oracle returns an error if you do not use the CASCADE CONSTRAINTS option and
constraints exist.

Dropping Clustered Tables


To drop a cluster, your schema must contain the cluster or you must have the DROP ANY
CLUSTER system privilege. You do not need additional privileges to drop a cluster that
contains tables, even if the clustered tables are not owned by the owner of the cluster.
Clustered tables can be dropped individually without affecting the table's cluster, other
clustered tables, or the cluster index. A clustered table is dropped just as a non-clustered
table is dropped--with the DROP TABLE statement. See "Dropping Tables".

Note:
When you drop a single table from a cluster, Oracle deletes each row of
the table individually. To maximize efficiency when you intend to drop
an entire cluster, drop the cluster including all tables by using the DROP
CLUSTER statement with the INCLUDING TABLES option. Drop an
individual table from a cluster (using the DROP TABLE statement) only if
you want the rest of the cluster to remain.

Dropping Cluster Indexes


A cluster index can be dropped without affecting the cluster or its clustered tables.
However, clustered tables cannot be used if there is no cluster index; you must re-create
the cluster index to allow access to the cluster. Cluster indexes are sometimes dropped as
part of the procedure to rebuild a fragmented cluster index. For information about
dropping an index, see "Dropping Indexes".
Viewing Information About Clusters
The following views display information about clusters:
View Description
DBA_CLUSTERS DBA view describes all clusters in the database. ALL view describes
ALL_CLUSTERS all clusters accessible to the user. USER view is restricted to clusters
USER_CLUSTERS owned by the user. Some columns in these views contain statistics
that are generated by the DBMS_STATS package or ANALYZE
statement.

DBA_CLU_COLUMNS These views map table columns to cluster columns


USER_CLU_COLUMNS

You might also like