Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views

08 File Handling

The document discusses file organization in databases, emphasizing the importance of efficient data storage on hard disks for quick access. It covers various types of record storage, including fixed and variable length records, as well as indexing methods like primary, secondary, and clustering indexing. The document also provides examples illustrating the impact of indexing on block access during data retrieval.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

08 File Handling

The document discusses file organization in databases, emphasizing the importance of efficient data storage on hard disks for quick access. It covers various types of record storage, including fixed and variable length records, as well as indexing methods like primary, secondary, and clustering indexing. The document also provides examples illustrating the impact of indexing on block access during data retrieval.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Ashish Kumar

Dept. of CSE
Manipal University Jaipur

1
File Organization
— Even though the database shows us data in the form of
relations we must understand that it is just a logical
representation.
— Finally all the data needs to be stored on the hard disk as
files.
— It is very important in performance point of view that this
organization should allow the database software to access
data quickly and in an efficient way.

2
Files on Disk
— How to store the records on hard disk?
— You can rely on operating systems. But, they way we access
normal data is totally different from the way we access
database data.
— In database, we might not require the whole file, just one or
two records are required. So, we let DBMS use its very own
organization.
— Database ====> Files / Block ====> Record ====> Fields
— We divide each table in block & then store it in hard disk.
— If the size of a block is 1024 byte and 1 record is of size 4
byte then how many records in one block you can save?
— 1024 / 4 =256 i.e; blocking factor is the average number of
record per block.
3
Fixed Length v/s Variable Length Records
— Consider a list of mobile numbers where each number is
exactly 10 digits long.
— Retrieving number will be simple here because we can read
10 characters at a time.
— But consider a list of names. How do we know the size of
names? They can vary.
— To make matters more complex we can have records of
certain number of fixed length entries and some variable
length entries.

4
Records on Disk in Blocks
— Spanned: It allows partial part of record to be stored in a
block.
R1 R2 R3 R4 R4 R5 R6 R7

— Advantages : no wastage of memory.


— Disadvantage: No. of block access will increase to access a
record.
— Unspanned: No record can be stored in more than one
block.
R1 R2 R3 R4 R5 R6 R7 ||||
— Advantages : No. of block access will be less to access a
record.
— Disadvantage: no wastage of memory.
5
Records on Disk in Files
— Ordered File Organization: All records in a file are ordered on
some search key value.
— Searching can be done in binary search mode.
— Advantage: Searching can be efficient. Only when we search
on search key value. If we search on other attribute then no
advantage.
— Disadvantage: Insertion will be expensive due to
reorganization of the entire file.
— Un-Ordered File Organization: All records in a file are inserted
wherever the place is available (usually at the end of file).
— Searching can be done in linear search mode.
— Advantage: Insertion of a record is efficient.
— Disadvantage: Searching is very inefficient. 6
Indexing
— Indexing mechanisms used to speed up access to desired
data. E.g., author catalog in library
— Search Key - attribute or set of attributes used to look up
records in a file.
— An index file consists of records (called index entries) of
the form
search-key Block -pointer

— Index files are typically much smaller than the original file
— Two basic kinds of indices:
— Ordered indices: search keys are stored in sorted order
— Hash indices: search keys are distributed uniformly
across “buckets” using a “hash function”.
7
Classification of Indexing
Non-
Primary key + Key/Candidate
Ordered Key + Unordered

Non-Key
+ ordered

These all are single level indexing.

8
Primary Indexing
— Data file is ordered on primary key & we will build index on
primary key.
— A primary index is an ordered file whose records are of
fixed length with two fields. First field is same as primary
key of data file and second field is a pointer to the data
block where key is available.
— Index is created for the first record of each block is known
as block anchors.

9
Dense Indexing
— Dense index — Index record appears for every search-key
value in the file.

10
Sparse Indexing
— Sparse Index: contains index records for only some search-
key values.
— Applicable when records are sequentially ordered on
search-key

11
Secondary Indexing
— Secondary Index provides a
secondary means of accessing
a file for which primary access
already exist.
— It will be dense index. i.e.,
index will be created for every
record in a file.
— Secondary Index does not
have any impact on how the
rows are actually organized in
data blocks.
— They can be in any order. The
only ordering is w.r.t the
index key in index blocks. 12
Clustering Indexing
— It is created on data file whose records are physically
ordered on a non-key attribute which does not have
distinct value for each record.

13
Primary Index Example
— Suppose that we have an ordered file of 30,000 records on a
disk with block size of 1024 bytes. Records are fixed and are
unspanned of size 100 bytes. Suppose we have created
primary index on the key filed of the size 9 bytes and a
block pointer of size 6 bytes, then find the average number
of block access to search a record with and without index.
Without Indexing:
— Record / block = 1024/ 100 = 10.24
— Since it is unspanned, data / block = 10
— Data block required to hold 30,000 = 30,000 / 10 = 3,000
— Block access to search a record = log2 3000 = 11.55 =
Approx. 12
14
Primary Index Example
With Indexing:
— Index record size = 9 + 6 = 15 bytes
— Record / block = 1024/ 15 = 68.266
— Since it is unspanned, data / block = 68
— Since it is primary index, no. of index record = No. of data
block = 3,000 (Due to block anchors)
— Block access to search a record = 3000 / 68 = 44.11 =
Approx. 45
— No. of block access required = log2 45 + 1 = 6 + 1 = 7

15
Secondary Index Example
— Same Ques as above.

Without Indexing:
— Record / block = 1024/ 100 = 10.24
— Since it is unspanned, data / block = 10
— Data block required to hold 30,000 = 30,000 / 10 = 3,000
— Since the data records are unsorted, block access to search
a record = 3000

16
Secondary Index Example
With Indexing:
— Index record size = 9 + 6 = 15 bytes
— Record / block = 1024/ 15 = 68.266
— Since it is unspanned, data / block = 68
— No. of index records = 30,000
— No. of blocks required = 30,000 / 68 = 441.176 = Approx 442
— No. of block access required = log2 442 + 1 = 9 + 1 = 10

17
Thank You

18

You might also like