Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

File Organization in RDBMS

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

File Organization in RDBMS

Databases are used to store information. Normally, the principal operations


we need to perform on database are those relating to:
 Creation of data – create, insert
 Retrieving data - select
 Modifying – update
 Deleting some information which we are sure is no longer useful or valid-
delete
In terms of storage
The need to store a set of tables, where each table can be stored as an
independent file. The attributes in a table are closely related and, therefore,
often accessed together. Therefore it makes sense to store the different
attribute values in each record contiguously.
File Organization is the way the files are arranged on the disk and access
method is how the data can be retrieved based on the File Organization.
Physical Database Design Issues
The database design involves the process of logical design with the help of E-
R diagram, normalization, etc., followed by the physical design.
The Key issues in the Physical Database Design are:
 The purpose of physical database design is to translate the logical
description of data into the technical specifications for storing and retrieving
data for the DBMS.
 The goal is to create a design for storing data that will provide adequate
performance and ensure database integrity, security and recoverability.
Some of the basic inputs required for Physical Database Design are:
 Normalized relations
 Attribute definitions
 Data usage: entered, retrieved, deleted, updated
 Requirements for security, backup, recovery, retention, integrity
 DBMS characteristics.
 Performance criteria such as response time requirement with respect to
volume estimates
Physical Records
These are the records that are stored in the secondary storage devices. For a
database relation, physical records are the group of fields stored in adjacent
memory locations and retrieved together as a unit. Considering the page
memory system, data page is the amount of data read or written in one I/O
operation to and from secondary storage device to the memory and vice-
versa. In this context we define a term blocking factor that is defined as the
number of physical records per page.
The issues relating to the Design of the Physical Database Files
Physical File is a file as stored on the disk.
The main issues relating to physical files are:
 Constructs to link two pieces of data:
 Sequential storage.
 Pointers.
 File Organization: How the files are arranged on the disk?
 Access Method: How the data can be retrieved based on the file
organization?
STORAGE OF DATABASE ON HARD DISKS
A database system provides an ultimate view of the stored data. However,
data in the form of bits, bytes get stored in different storage devices.

Types of Data Storage

For storing the data, there are different types of storage options available.
These storage types differ from one another as per the speed and
accessibility. The types are:

o Primary Storage
o Secondary Storage
o Tertiary Storage
 Primary Storage − The memory storage that is directly accessible to
the CPU comes under this category. CPU's internal memory
(registers), fast memory (cache), and main memory (RAM) are directly
accessible to the CPU, as they are all placed on the motherboard or
CPU chipset. This storage is typically very small, ultra-fast, and
volatile. Primary storage requires continuous power supply in order to
maintain its state. In case of a power failure, all its data is lost.
 Secondary Storage − Secondary storage devices are used to store
data for future use or as backup. Secondary storage includes memory
devices that are not a part of the CPU chipset or motherboard, for
example, magnetic disks, optical disks (DVD, CD, etc.), hard disks,
flash drives, and magnetic tapes.
 Tertiary Storage − Tertiary storage is used to store huge volumes of
data. Since such storage devices are external to the computer system,
they are the slowest in speed. These storage devices are mostly used
to take the back up of an entire system. Optical disks and magnetic
tapes are widely used as tertiary storage.
FILE ORGANISATION AND ITS TYPES
A file organization is a technique to organize data in the secondary memory.
It the logical relationship between records and how they are mapped to disk
blocks.
File organization is a way of arranging the records in a file when the file is
stored on the disk. Data files are organized so as to facilitate access to
records and to ensure their efficient storage. If rapid access is required, more
storage is required to make it possible.
Selection of File Organizations is dependent on two factors as
shown below:
 Typical DBMS applications need a small subset of the DB at any given time.
 When a portion of the data is needed it must be located on disk, copied to
memory for processing and rewritten to disk if the data was modified.
A DBMS supports several file organization techniques. The important task of
the DBA is to choose a good organization for each file, based on its type of
use.
1. Heap Files (Unordered Files)
2. Sequential File Organization
3. Indexed File Organization
4. Hashed File Organization
Heap files (Unordered file)
The heap file is also known as an unordered file . It is the simplest and most
basic type. These files consist of randomly ordered records. The records will
have no particular order. The operations we can perform on the records are
insert, retrieve and delete.
The features of the heap file or the pile file organization are:
 New records can be inserted in any empty space that can accommodate
them.
 When old records are deleted, the occupied space becomes empty and
available for any new insertion.
 If updated records grow; they may need to be relocated (moved) to a new
empty space. This needs to keep a list of empty space.

Advantages of heap files


1. This is a simple file organization method.
2. Insertion is somehow efficient.
3. Good for bulk-loading data into a table.
4. Best if file scans are common or insertions are frequent.
Disadvantages of heap files

1. This method is inefficient for large databases.

2. Deletion can result in unused space/need for reorganization.


Sequential File Organization
The most basic way to organize the collection of records in a file is to use
sequential organization. Records of the file are stored in sequence by the
primary key field values. They are accessible only in the order stored, i.e., in
the primary key order. This kind of file organization works well for tasks
which need to access nearly every record in a file, e.g., payroll. In a
sequentially organized file records are written consecutively when the file is
created and must be accessed consecutively when the file is later used for
input.

Structure of Sequential
File

A sequential file maintains the records in the logical sequence of its primary
key values. Sequential files are inefficient for random access, however, are
suitable for sequential access. A sequential file can be stored on devices like
magnetic tape that allow sequential access. However, if a sequential file is
stored on a disk with keyword stored separately from the rest of record, then
only those disk blocks need to be read that contains the desired record or
records. This type of storage allows binary search on sequential file blocks,
thus, enhancing the speed of access.
Updating a sequential file usually creates a new file so that the record
sequence on primary key is maintained. The update operation first copies
the records till the record after which update is required into the new file and
then the updated record is put followed by the remainder of records. Thus
method of updating a sequential file automatically creates a backup copy.
Adding a record requires shifting of all records from the point of insertion to
the end of file to create space for the new record. On the other hand deletion
of a record requires a compression of the file space.
The basic advantages of sequential file is the sequential processing, as next
record is easily accessible despite the absence of any data structure.
However, simple queries are time consuming for large files. A single update
is expensive as new file must be created, therefore, to reduce the cost per
update, all updates requests are sorted in the order of the sequential file.
This update file is then used to update the sequential file in a single go. The
file containing the updates is sometimes referred to as a transaction file. This
process is called the batch mode of updating. In this mode each record of
master sequential file is checked for one or more possible updates by
comparing with the update information of transaction file. The records are
written to new master file in the sequential manner. A record that require
multiple update is written only when all the updates have been performed on
the record. A record that is to be deleted is not written to new master file.
Thus, a new updated master file will be created from the transaction file and
old master file. Thus, update, insertion and deletion of records in a
sequential file require a new file creation.

Advantages of Sequential File organization


 It is fast and efficient when dealing with large volumes of data that need to
be processed periodically (batch system).
Disadvantages of sequential File organization
 Requires that all new transactions be sorted into the proper sequence for
sequential access processing.
 Locating, storing, modifying, deleting, or adding records in the file require
rearranging the file.
 This method is too slow to handle applications requiring immediate
updating or responses.
Indexed (Indexed Sequential) File organization
It organizes the file like a large dictionary, i.e., records are stored in order of
the key but an index is kept which also permits a type of direct access. The
records are stored sequentially by primary key values and there is an index
built over the primary key field.
The retrieval of a record from a sequential file, on average, requires access
to half the records in the file, making such inquiries not only inefficient but
very time consuming for large files. To improve the query response time of a
sequential file, a type of indexing technique can be added.
An index is a set of index value, address pairs. Indexing associates a set of
objects to a set of orderable quantities that are usually smaller in number or
their properties. Thus, an index is a mechanism for faster search. Although
the indices and the data blocks are kept together physically, they are
logically distinct.
Let us use the term an index file to describes the indexes and let us refer to
data files as data records. An index can be small enough to be read into the
main memory. A sequential (or sorted on primary keys) file that is indexed
on its primary key is called an index sequential file. The index allows for
random access to records, while the sequential storage of the records of the
file provides easy access to the sequential records.
Advantages of Indexed Sequential File Organization

 In this method, each record has the address of its data block,
searching a record in a huge database is quick and easy.

Disadvantages of Indexed Sequential File Organization

 This method requires extra space in the disk to store the index value.
 When the new records are inserted, then these files have to be
reconstructed to maintain the sequence.
 When the record is deleted, then the space used by it needs to be
released. Otherwise, the performance of the database will slow down.

Hashed File organization


Hashing is the most common form of purely random access to a file or
database. It is also used to access columns that do not have an index as an
optimization technique.
Hash functions calculate the address of the page in which the record is to be
stored based on one or more fields in the record. The records in a hash file
appear randomly distributed across the available space. It requires some
hashing algorithm and the technique.

When a record has to be received using the hash key columns, then the
address is generated, and the whole record is retrieved using that address.
In the same way, when a new record has to be inserted, then the address is
generated using the hash key and record is directly inserted. The same
process is applied in the case of delete and update.

In this method, there is no effort for searching and sorting the entire file. In
this method, each record will be stored randomly in the memory.

Hashing Algorithm converts a primary key value into a record address. The
most popular form of hashing is division hashing with chained overflow.
Advantages of Hashed file organization
1. Insertion or search on hash-key is fast.
2. Best if equality search is needed on hash-key.
Disadvantages of Hashed file organization
1. It is a complex file organization method
2. Search is slow
3. It suffers from disk space overhead

You might also like