3: Database Systems: Part V: Physical Database Design
3: Database Systems: Part V: Physical Database Design
3: Database Systems: Part V: Physical Database Design
The process of mapping the logical data model into an internal set of physical database structures Major consideration:
Can the user get the desired information, in the appropriate format, and in a timely (i.e. acceptable response time) fashion?
Implement the database as a set of stored records, files, indexes, etc. Provide adequate performance Ensure database integrity, security, and recoverability
Size of database and frequency of use Response time, security, backup, recovery, and retention of data
Data volume and usage analysis Data distribution strategy File organization Indexes Integrity constraints
Database size
Used to select physical storage devices and estimate cost of storage Used to select file organization and access methods Plan for use of indexes Strategy for data distribution
6
Usage paths
Data volumes
10
11
Different approaches to determine at which nodes or sites to physically locate the data in a distributed computing network Four strategies
Centralized
Simple implementation Data not readily accessible to remote users Expensive data communication costs When central system crashes, entire database system fails
13
Disadvantages
Partitioned
Database is divided into nonoverlapping partitions or fragments which are assigned to particular sites Advantage
Disadvantage
14
Replicated
Duplicate copies of the entire database are assigned to more than one site in the network Advantage
Disadvantage:
15
Hybrid
Database is partitioned into critical and non-critical fragments Critical fragments are stored at multiple sites, while non-critical fragments are only in one site What are the advantages and disadvantages of this approach?
16
File Organization
How records are physically arranged or stored on secondary storage devices Example
17
Sequential Indexed
Hashed
18
Records in the file are stored in sequence according to a primary key value
1 2
If sorted
every insert or delete requires resort
If not sorted
Average time to find desired record = n/2. n
19
An index is created that allows user to locate individual records faster Index
A table or other data structure used to determine the location of rows in the main table that satisfy some condition
20
Indexed Sequential
Records are stored sequentially by primary key value Uses block index Example:
21
Indexed Non-Sequential
Books in a library
22
A hashing algorithm is used to determine the address of each record Hashing algorithm
Converts a primary key value into a relative record number or file address Example: Divide primary key value by a prime number and use the remainder as the storage location
23
Select a file organization that provides a reasonable balance among the following criteria:
Fast access for retrieval High throughput for processing transactions Efficient use of storage devices Protection from failures or data loss Minimal need for reorganization Accommodation for file growth Security from unauthorized use
24
Physical characteristics of secondary storage devices Available operating system File management software User needs for storing and accessing data
25
Indexes
Stored in main memory for faster searching of required values Types of index
26
Types of Indexes
Primary key
Non-key
Clustering
Clustering Indexes
Clustering attribute
Any non-key attribute used to group together rows that have a common value for the attribute
Index defined on the clustering attribute of a table
Clustering index
28
PRODUCT TABLE
DESCRIPTION Bookcase Dresser Chair Stand Chair Dresser Dresser FINISH Oak Maple Cherry Pine Maple Oak Pine PRICE 75 625 100 750 125 800 1200
29
RECORD NO. 1 2 3 4 5 6 7
PRODUCT TABLE
DESCRIPTION Bookcase Chair Chair Dresser Dresser Dresser Stand FINISH Oak Cherry Maple Maple Oak Pine Pine PRICE 75 100 125 625 800 1200 750
30
RECORD NO. 1 2 3 4 5 6 7
Trees
Degree of a tree Maximum number of children allowed per parent Number of levels between the root node and a leaf node in a tree
31
Depth
Balanced Trees
Also called B-Trees A tree in which all leaves are of the same distance from the root Index files are most commonly organized using B-trees, which have predictable efficiency Also support sequential retrieval of records
32
Improved performance for retrievals versus degraded performance for inserting, deleting, and updating records in a table Examples
34
Specify a unique index for the primary key attribute of each table In most situations, it is also advisable to specify an index for foreign keys Specify an index for non-key attributes that are referred to in qualification, sorting, and grouping commands
35
Index search fields Index only large tables (when there are >100 values but not when there are <30 values) Null values will not be referenced from an index Remember, only use indexes heavily for non-volatile databases
36
Integrity Constraints
Business rules that preserve the integrity of the data Four types
37
Referential Integrity
Considers the validity of references between objects in a database The value of a foreign key in one table (referencing table) must be an actual value of a primary key in some other table (referenced table), or else it must be null, if allowed
38
Insertion Rule
A row cannot be inserted in the referencing table unless a matching entry already exists in the referenced table If insertion is allowed even without a matching entry in the referenced table, a null value is used for the foreign key in the referencing table
39
Deletion Rule
A row cannot be deleted from the referenced table if there are matching rows in the referencing table
Restrict Nullify Cascade
40
Delete Rules
Restrict
Nullify
Cascade
Denormalization
Database may not always be implemented in normalized form Used to speed up data access Reduces number of tables that must be accessed to retrieve data No hard and fast rules
43
Denormalization
One-to-one relationship between two entities Many-to-many relationship with non-key attributes Reference data
44
Denormalization of One-to-One
Name Student_ID Address Application_Date Status
STUDENT
has
SCHOLARSHIP APPLICATION
Student_ID Application_ID
Denormalized relation:
45
Denormalization to Many-to-Many
Vendor_Name Address Price Description
VENDOR
submits
PRICE QUOTE
given for
ITEM
Vendor_ID
Vendor_ID
Item_ID
Item_ID
Denormalized relations: VENDOR (Vendor_ID, Vendor_Name, Address) ITEM_QUOTE (Vendor_ID, Item_ID, Description, Price)
46
STORAGE
stores
ITEM
Storage_ID
Storage_ID
Item_ID
47