Unit 5 Dbms
Unit 5 Dbms
A database consist of a huge amount of data. The data is grouped within a table in RDBMS, and
each table have related records. A user can see that the data is stored in form of tables, but in
actual this huge amount of data is stored in physical memory in form of files.
File – A file is named collection of related information that is recorded on secondary storage such as
magnetic disks, magnetic tables and optical disks.
File Organization
o The File is a collection of records. Using the primary key, we can access the records. The type and
frequency of access can be determined by the type of file organization which was used for a given
set of records.
o File organization is a logical relationship among various records. This method defines how file
records are mapped onto disk blocks.
o File organization is used to describe the way in which the records are stored in terms of blocks,
and the blocks are placed on the storage medium.
o The first approach to map the database to the file is to use the several files and store only one
fixed length record in any given file. An alternative approach is to structure our files so that we
can contain multiple lengths for records.
o Files of fixed length records are easier to implement than the files of variable length records.
Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records are
nothing but a row in the table. Suppose we want to insert a new record R2 in the sequence, then it will be
placed at the end of the file. Here, records are nothing but a row in any table.
o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like magnetic tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like grade calculation of a
student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.
o It will waste time as we cannot jump on a particular record that is required but we have to move
sequentially which takes our time.
o Sorted file method takes more time and space for sorting the records.
When a record has to be received using the hash key columns, then the address is generated, and the
whole record is retrieved using that address. In the same way, when a new record has to be inserted, then
the address is generated using the hash key and record is directly inserted. The same process is applied in
the case of delete and update.
In this method, there is no effort for searching and sorting the entire file. In this method, each record will
be stored randomly in the memory.
B+ File Organization
o B+ tree file organization is the advanced method of an indexed sequential access method. It uses
a tree-like structure to store records in File.
o It uses the same concept of key-index where the primary key is used to sort the records. For each
primary key, the value of the index is generated and mapped with the record.
o The B+ tree is similar to a binary search tree (BST), but it can have more than two children. In this
method, all the records are stored only at the leaf node. Intermediate nodes act as a pointer to
the leaf nodes. They do not contain any records.
The above B+ tree shows that:
o There is one root node of the tree, i.e., 25.
o There is an intermediary layer with nodes. They do not store the actual record. They have only
pointers to the leaf node.
o The nodes to the left of the root node contain the prior value of the root and nodes to the right
contain next value of the root, i.e., 15 and 30 respectively.
o There is only one leaf node which has only values, i.e., 10, 12, 17, 20, 24, 27 and 29.
o Searching for any record is easier as all the leaf nodes are balanced.
o In this method, searching any record can be traversed through the single path and accessed
easily.
In cluster file organization, two or more related tables/records are stored within same file known as
clusters. These files will have two or more tables in the same data block and the key attributes
which are used to map these table together are stored only once.
Thus it lowers the cost of searching and retrieving various records in different files as they are now
combined and kept in a single cluster.
For example we have two tables or relation Employee and Department. These table are related to
each other.
If we have to insert, update or delete any record we can directly do so. Data is sorted based on the
primary key or the key with which searching is done. Cluster key is the key with which joining of
the table is performed.
Types of Cluster File Organization – There are two ways to implement this method:
1. Indexed Clusters –
In Indexed clustering the records are group based on the cluster key and stored together. The
above mentioned example of the Employee and Department relationship is an example of
Indexed Cluster where the records are based on the Department ID.
2. Hash Clusters –
This is very much similar to indexed cluster with only difference that instead of storing the
records based on cluster key, we generate hash key value and store the records with same
hash key value.
Introduction of B-Tree
Introduction:
B-Tree is a self-balancing search tree. In most of the other self-balancing search trees
(like AVL and Red-Black Trees), it is assumed that everything is in main memory. To understand
the use of B-Trees, we must think of the huge amount of data that cannot fit in main memory.
When the number of keys is high, the data is read from disk in the form of blocks. Disk access time
is very high compared to the main memory access time. The main idea of using B-Trees is to
reduce the number of disk accesses. Most of the tree operations (search, insert, delete, max,
min, ..etc ) require O(h) disk accesses where h is the height of the tree. B-tree is a fat tree. The
height of B-Trees is kept low by putting maximum possible keys in a B-Tree node. Generally, the
B-Tree node size is kept equal to the disk block size. Since the height of the B-tree is low so total
disk accesses for most of the operations are reduced significantly compared to balanced Binary
Search Trees like AVL Tree, Red-Black Tree, ..etc.
Time Complexity of B-Tree:
Difference between Sequential, heap/Direct, Hash, ISAM, B+ Tree, Cluster file organization
in database management system (DBMS) as shown below:
https://www.tutorialcup.com/dbms/file-organization.htm
https://www.javatpoint.com/file-organization-storage
https://www.guru99.com/indexing-in-database.html