08 File Handling
08 File Handling
Dept. of CSE
Manipal University Jaipur
1
File Organization
Even though the database shows us data in the form of
relations we must understand that it is just a logical
representation.
Finally all the data needs to be stored on the hard disk as
files.
It is very important in performance point of view that this
organization should allow the database software to access
data quickly and in an efficient way.
2
Files on Disk
How to store the records on hard disk?
You can rely on operating systems. But, they way we access
normal data is totally different from the way we access
database data.
In database, we might not require the whole file, just one or
two records are required. So, we let DBMS use its very own
organization.
Database ====> Files / Block ====> Record ====> Fields
We divide each table in block & then store it in hard disk.
If the size of a block is 1024 byte and 1 record is of size 4
byte then how many records in one block you can save?
1024 / 4 =256 i.e; blocking factor is the average number of
record per block.
3
Fixed Length v/s Variable Length Records
Consider a list of mobile numbers where each number is
exactly 10 digits long.
Retrieving number will be simple here because we can read
10 characters at a time.
But consider a list of names. How do we know the size of
names? They can vary.
To make matters more complex we can have records of
certain number of fixed length entries and some variable
length entries.
4
Records on Disk in Blocks
Spanned: It allows partial part of record to be stored in a
block.
R1 R2 R3 R4 R4 R5 R6 R7
Index files are typically much smaller than the original file
Two basic kinds of indices:
Ordered indices: search keys are stored in sorted order
Hash indices: search keys are distributed uniformly
across “buckets” using a “hash function”.
7
Classification of Indexing
Non-
Primary key + Key/Candidate
Ordered Key + Unordered
Non-Key
+ ordered
8
Primary Indexing
Data file is ordered on primary key & we will build index on
primary key.
A primary index is an ordered file whose records are of
fixed length with two fields. First field is same as primary
key of data file and second field is a pointer to the data
block where key is available.
Index is created for the first record of each block is known
as block anchors.
9
Dense Indexing
Dense index — Index record appears for every search-key
value in the file.
10
Sparse Indexing
Sparse Index: contains index records for only some search-
key values.
Applicable when records are sequentially ordered on
search-key
11
Secondary Indexing
Secondary Index provides a
secondary means of accessing
a file for which primary access
already exist.
It will be dense index. i.e.,
index will be created for every
record in a file.
Secondary Index does not
have any impact on how the
rows are actually organized in
data blocks.
They can be in any order. The
only ordering is w.r.t the
index key in index blocks. 12
Clustering Indexing
It is created on data file whose records are physically
ordered on a non-key attribute which does not have
distinct value for each record.
13
Primary Index Example
Suppose that we have an ordered file of 30,000 records on a
disk with block size of 1024 bytes. Records are fixed and are
unspanned of size 100 bytes. Suppose we have created
primary index on the key filed of the size 9 bytes and a
block pointer of size 6 bytes, then find the average number
of block access to search a record with and without index.
Without Indexing:
Record / block = 1024/ 100 = 10.24
Since it is unspanned, data / block = 10
Data block required to hold 30,000 = 30,000 / 10 = 3,000
Block access to search a record = log2 3000 = 11.55 =
Approx. 12
14
Primary Index Example
With Indexing:
Index record size = 9 + 6 = 15 bytes
Record / block = 1024/ 15 = 68.266
Since it is unspanned, data / block = 68
Since it is primary index, no. of index record = No. of data
block = 3,000 (Due to block anchors)
Block access to search a record = 3000 / 68 = 44.11 =
Approx. 45
No. of block access required = log2 45 + 1 = 6 + 1 = 7
15
Secondary Index Example
Same Ques as above.
Without Indexing:
Record / block = 1024/ 100 = 10.24
Since it is unspanned, data / block = 10
Data block required to hold 30,000 = 30,000 / 10 = 3,000
Since the data records are unsorted, block access to search
a record = 3000
16
Secondary Index Example
With Indexing:
Index record size = 9 + 6 = 15 bytes
Record / block = 1024/ 15 = 68.266
Since it is unspanned, data / block = 68
No. of index records = 30,000
No. of blocks required = 30,000 / 68 = 441.176 = Approx 442
No. of block access required = log2 442 + 1 = 9 + 1 = 10
17
Thank You
18