3: Database Systems: Part V: Physical Database Design

3: Database Systems
Part V: Physical Database Design
Physical Database Design
The process of mapping the logical data model into an internal set of physical database structures Major consideration:
Can the user get the desired information, in the appropriate format, and in a timely (i.e. acceptable response time) fashion?
Objectives of Physical Database Design
Implement the database as a set of stored records, files, indexes, etc. Provide adequate performance Ensure database integrity, security, and recoverability
Major Inputs to Physical Design
Logical data model User processing requirements, including

Size of database and frequency of use Response time, security, backup, recovery, and retention of data
DBMS characteristics and other components of computer operating environment

4
Components of Physical Design
Data volume and usage analysis Data distribution strategy File organization Indexes Integrity constraints
Data Volume and Usage Analysis
Database size
Used to select physical storage devices and estimate cost of storage Used to select file organization and access methods Plan for use of indexes Strategy for data distribution
6
Usage paths
Composite Usage Map
Composite Usage Map
Data volumes
Composite Usage Map
Access Frequencies (per hour)
Composite Usage Map

Usage analysis:
200 purchased parts accessed per hour 80 quotations accessed from these 200 purchased part accesses 70 suppliers accessed from these 80 quotation accesses
10
Composite Usage Map

Usage analysis:
75 suppliers accessed per hour 40 quotations accessed from these 75 supplier accesses 40 purchased parts accessed from these 40 quotation accesses
11
Data Distribution Strategies
Different approaches to determine at which nodes or sites to physically locate the data in a distributed computing network Four strategies

Centralized Partitioned Replicated Hybrid

12
Centralized
All data are located at a single site Advantage
Simple implementation Data not readily accessible to remote users Expensive data communication costs When central system crashes, entire database system fails
13
Disadvantages
Partitioned
Database is divided into nonoverlapping partitions or fragments which are assigned to particular sites Advantage
Data is more accessible to local user

More complex implementation
Disadvantage
14
Replicated
Duplicate copies of the entire database are assigned to more than one site in the network Advantage
Maximizes local access to data

Update problems (synchronization)
Disadvantage:
15
Hybrid
Database is partitioned into critical and non-critical fragments Critical fragments are stored at multiple sites, while non-critical fragments are only in one site What are the advantages and disadvantages of this approach?
16
File Organization
How records are physically arranged or stored on secondary storage devices Example
Storage on hard disks, tapes, CD-ROMs, etc.
17
Basic File Organizations
Sequential Indexed

Indexed sequential Indexed non-sequential
Hashed
18
Sequential File Organization
Records in the file are stored in sequence according to a primary key value
1 2
If sorted
every insert or delete requires resort
If not sorted
Average time to find desired record = n/2. n
19
Indexed File Organization
An index is created that allows user to locate individual records faster Index
A table or other data structure used to determine the location of rows in the main table that satisfy some condition
20
Indexed Sequential
Records are stored sequentially by primary key value Uses block index Example:
White pages phone directory
21
Indexed Non-Sequential
Records are stored non-sequentially Full index is required Example
Books in a library
22
Hashed File Organization
A hashing algorithm is used to determine the address of each record Hashing algorithm
Converts a primary key value into a relative record number or file address Example: Divide primary key value by a prime number and use the remainder as the storage location
23
Selecting File Organization
Select a file organization that provides a reasonable balance among the following criteria:

Fast access for retrieval High throughput for processing transactions Efficient use of storage devices Protection from failures or data loss Minimal need for reorganization Accommodation for file growth Security from unauthorized use
24
Constraints in Selecting File Organization
Physical characteristics of secondary storage devices Available operating system File management software User needs for storing and accessing data
25
Indexes
Stored in main memory for faster searching of required values Types of index
Primary key Non-key Clustering
26
Types of Indexes
Primary key
Index created based on the primary key

Index created for each desired non-key attribute Speeds up retrievals by physically ordering the file or table based on a nonkey attribute
27
Non-key
Clustering
Clustering Indexes
Clustering attribute
Any non-key attribute used to group together rows that have a common value for the attribute
Index defined on the clustering attribute of a table
Clustering index
28
Clustering Index: An Example

DESCRIPTION Bookcase Chair Dresser Stand RECORD NO. 1 3,5 2,6,7 4
DESCRIPTION INDEX (Non-clustered)
PRODUCT TABLE
DESCRIPTION Bookcase Dresser Chair Stand Chair Dresser Dresser FINISH Oak Maple Cherry Pine Maple Oak Pine PRICE 75 625 100 750 125 800 1200
29
RECORD NO. 1 2 3 4 5 6 7
PRODUCT NO. 0100 0350 0975 1000 1250 1425 1775
Clustering Index: An Example

DESCRIPTION Bookcase Chair Dresser Stand RECORD NO. 1 2 4 7
DESCRIPTION INDEX (Clustered)
PRODUCT TABLE
DESCRIPTION Bookcase Chair Chair Dresser Dresser Dresser Stand FINISH Oak Cherry Maple Maple Oak Pine Pine PRICE 75 100 125 625 800 1200 750
30
RECORD NO. 1 2 3 4 5 6 7
PRODUCT NO. 0100 0975 1250 0350 1425 1775 1000
Trees
Most common data structure for implementing indexes Branching factor
Degree of a tree Maximum number of children allowed per parent Number of levels between the root node and a leaf node in a tree
31
Depth
Balanced Trees
Also called B-Trees A tree in which all leaves are of the same distance from the root Index files are most commonly organized using B-trees, which have predictable efficiency Also support sequential retrieval of records
32
Using B-Trees in Indexes
uses a tree search

Average time to find desired record = depth of the tree
33
Main Trade-Off of Using an Index
Improved performance for retrievals versus degraded performance for inserting, deleting, and updating records in a table Examples

Decision Support Systems (DSS) Transaction Processing Systems (TPS)
34
When to Use Indexes
Specify a unique index for the primary key attribute of each table In most situations, it is also advisable to specify an index for foreign keys Specify an index for non-key attributes that are referred to in qualification, sorting, and grouping commands
35
When to Use Indexes
Index search fields Index only large tables (when there are >100 values but not when there are <30 values) Null values will not be referenced from an index Remember, only use indexes heavily for non-volatile databases
36
Integrity Constraints
Business rules that preserve the integrity of the data Four types
Default value Domain Null value Referential integrity
37
Referential Integrity
Considers the validity of references between objects in a database The value of a foreign key in one table (referencing table) must be an actual value of a primary key in some other table (referenced table), or else it must be null, if allowed
38
Referential Integrity Rules
Insertion Rule
A row cannot be inserted in the referencing table unless a matching entry already exists in the referenced table If insertion is allowed even without a matching entry in the referenced table, a null value is used for the foreign key in the referencing table
39
Referential Integrity Rules
Deletion Rule
A row cannot be deleted from the referenced table if there are matching rows in the referencing table
Restrict Nullify Cascade
40
Three applicable rules

Delete Rules

Restrict
Deletion is not allowed

Foreign key values changed to null in the referencing table before corresponding row in the referenced table is deleted Affected rows in the referencing table are deleted first before matching row in the referenced table is deleted
41
Nullify
Cascade
Enforcing Referential Integrity
Enforcing referential integrity in application programs
Unreliable -- may be handled differently in separate programs and cause conflicts
Enforcing referential integrity constraints within the DBMS
Consistent enforcement of rules Makes programming and maintenance easier

42
Denormalization
Database may not always be implemented in normalized form Used to speed up data access Reduces number of tables that must be accessed to retrieve data No hard and fast rules
43
Denormalization
Situations to consider denormalization
One-to-one relationship between two entities Many-to-many relationship with non-key attributes Reference data
44
Denormalization of One-to-One
Name Student_ID Address Application_Date Status
STUDENT
has
SCHOLARSHIP APPLICATION
Student_ID Application_ID
Denormalized relation:
STUDENT (Student_ID, Name, Address, Application_Date, Status)
45
Denormalization to Many-to-Many
Vendor_Name Address Price Description
VENDOR
submits
PRICE QUOTE
given for
ITEM
Vendor_ID
Vendor_ID
Item_ID
Item_ID
Denormalized relations: VENDOR (Vendor_ID, Vendor_Name, Address) ITEM_QUOTE (Vendor_ID, Item_ID, Description, Price)
46
Denormalization of Reference Data

Container_No Cabinet_No Description
STORAGE
stores
ITEM
Storage_ID
Storage_ID
Item_ID
Denormalized relation: STORAGE (Item_ID, Description, Container_No, Cabinet_No)
47

3: Database Systems: Part V: Physical Database Design

Uploaded by

Copyright:

Available Formats

3: Database Systems: Part V: Physical Database Design

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3: Database Systems: Part V: Physical Database Design

Uploaded by

Copyright:

Available Formats

3: Database Systems

Part V: Physical Database Design

Physical Database Design

Objectives of Physical Database Design

Major Inputs to Physical Design

Logical data model User processing requirements, including

DBMS characteristics and other components of computer operating environment

Components of Physical Design

Data Volume and Usage Analysis

Composite Usage Map

Composite Usage Map

Composite Usage Map

Access Frequencies (per hour)

Composite Usage Map

Composite Usage Map

Data Distribution Strategies

Centralized Partitioned Replicated Hybrid

All data are located at a single site Advantage

Data is more accessible to local user

Maximizes local access to data

Storage on hard disks, tapes, CD-ROMs, etc.

Basic File Organizations

Indexed sequential Indexed non-sequential

Sequential File Organization

Indexed File Organization

White pages phone directory

Records are stored non-sequentially Full index is required Example

Hashed File Organization

Selecting File Organization

Constraints in Selecting File Organization

Primary key Non-key Clustering

Index created based on the primary key

Clustering Index: An Example

DESCRIPTION INDEX (Non-clustered)

PRODUCT NO. 0100 0350 0975 1000 1250 1425 1775

Clustering Index: An Example

DESCRIPTION INDEX (Clustered)

PRODUCT NO. 0100 0975 1250 0350 1425 1775 1000

Most common data structure for implementing indexes Branching factor

Using B-Trees in Indexes

uses a tree search

Main Trade-Off of Using an Index

Decision Support Systems (DSS) Transaction Processing Systems (TPS)

When to Use Indexes

When to Use Indexes

Default value Domain Null value Referential integrity

Referential Integrity Rules

Referential Integrity Rules

Three applicable rules

Deletion is not allowed

Enforcing Referential Integrity

Enforcing referential integrity in application programs

Unreliable -- may be handled differently in separate programs and cause conflicts

Enforcing referential integrity constraints within the DBMS

Consistent enforcement of rules Makes programming and maintenance easier

Situations to consider denormalization

STUDENT (Student_ID, Name, Address, Application_Date, Status)

Denormalization of Reference Data