File Organization in RDBMS
File Organization in RDBMS
File Organization in RDBMS
For storing the data, there are different types of storage options available.
These storage types differ from one another as per the speed and
accessibility. The types are:
o Primary Storage
o Secondary Storage
o Tertiary Storage
Primary Storage − The memory storage that is directly accessible to
the CPU comes under this category. CPU's internal memory
(registers), fast memory (cache), and main memory (RAM) are directly
accessible to the CPU, as they are all placed on the motherboard or
CPU chipset. This storage is typically very small, ultra-fast, and
volatile. Primary storage requires continuous power supply in order to
maintain its state. In case of a power failure, all its data is lost.
Secondary Storage − Secondary storage devices are used to store
data for future use or as backup. Secondary storage includes memory
devices that are not a part of the CPU chipset or motherboard, for
example, magnetic disks, optical disks (DVD, CD, etc.), hard disks,
flash drives, and magnetic tapes.
Tertiary Storage − Tertiary storage is used to store huge volumes of
data. Since such storage devices are external to the computer system,
they are the slowest in speed. These storage devices are mostly used
to take the back up of an entire system. Optical disks and magnetic
tapes are widely used as tertiary storage.
FILE ORGANISATION AND ITS TYPES
A file organization is a technique to organize data in the secondary memory.
It the logical relationship between records and how they are mapped to disk
blocks.
File organization is a way of arranging the records in a file when the file is
stored on the disk. Data files are organized so as to facilitate access to
records and to ensure their efficient storage. If rapid access is required, more
storage is required to make it possible.
Selection of File Organizations is dependent on two factors as
shown below:
Typical DBMS applications need a small subset of the DB at any given time.
When a portion of the data is needed it must be located on disk, copied to
memory for processing and rewritten to disk if the data was modified.
A DBMS supports several file organization techniques. The important task of
the DBA is to choose a good organization for each file, based on its type of
use.
1. Heap Files (Unordered Files)
2. Sequential File Organization
3. Indexed File Organization
4. Hashed File Organization
Heap files (Unordered file)
The heap file is also known as an unordered file . It is the simplest and most
basic type. These files consist of randomly ordered records. The records will
have no particular order. The operations we can perform on the records are
insert, retrieve and delete.
The features of the heap file or the pile file organization are:
New records can be inserted in any empty space that can accommodate
them.
When old records are deleted, the occupied space becomes empty and
available for any new insertion.
If updated records grow; they may need to be relocated (moved) to a new
empty space. This needs to keep a list of empty space.
Structure of Sequential
File
A sequential file maintains the records in the logical sequence of its primary
key values. Sequential files are inefficient for random access, however, are
suitable for sequential access. A sequential file can be stored on devices like
magnetic tape that allow sequential access. However, if a sequential file is
stored on a disk with keyword stored separately from the rest of record, then
only those disk blocks need to be read that contains the desired record or
records. This type of storage allows binary search on sequential file blocks,
thus, enhancing the speed of access.
Updating a sequential file usually creates a new file so that the record
sequence on primary key is maintained. The update operation first copies
the records till the record after which update is required into the new file and
then the updated record is put followed by the remainder of records. Thus
method of updating a sequential file automatically creates a backup copy.
Adding a record requires shifting of all records from the point of insertion to
the end of file to create space for the new record. On the other hand deletion
of a record requires a compression of the file space.
The basic advantages of sequential file is the sequential processing, as next
record is easily accessible despite the absence of any data structure.
However, simple queries are time consuming for large files. A single update
is expensive as new file must be created, therefore, to reduce the cost per
update, all updates requests are sorted in the order of the sequential file.
This update file is then used to update the sequential file in a single go. The
file containing the updates is sometimes referred to as a transaction file. This
process is called the batch mode of updating. In this mode each record of
master sequential file is checked for one or more possible updates by
comparing with the update information of transaction file. The records are
written to new master file in the sequential manner. A record that require
multiple update is written only when all the updates have been performed on
the record. A record that is to be deleted is not written to new master file.
Thus, a new updated master file will be created from the transaction file and
old master file. Thus, update, insertion and deletion of records in a
sequential file require a new file creation.
In this method, each record has the address of its data block,
searching a record in a huge database is quick and easy.
This method requires extra space in the disk to store the index value.
When the new records are inserted, then these files have to be
reconstructed to maintain the sequence.
When the record is deleted, then the space used by it needs to be
released. Otherwise, the performance of the database will slow down.
When a record has to be received using the hash key columns, then the
address is generated, and the whole record is retrieved using that address.
In the same way, when a new record has to be inserted, then the address is
generated using the hash key and record is directly inserted. The same
process is applied in the case of delete and update.
In this method, there is no effort for searching and sorting the entire file. In
this method, each record will be stored randomly in the memory.
Hashing Algorithm converts a primary key value into a record address. The
most popular form of hashing is division hashing with chained overflow.
Advantages of Hashed file organization
1. Insertion or search on hash-key is fast.
2. Best if equality search is needed on hash-key.
Disadvantages of Hashed file organization
1. It is a complex file organization method
2. Search is slow
3. It suffers from disk space overhead