0% found this document useful (0 votes)

71 views

Accessing The Data.: Dept. of Ise, Gsssietw

Disk access is much slower than memory access, taking 30,000 times as long. This driving tension between disk capacity and speed is the heart of file structure design. Early structures used sequential or indexed access, but as files grew, tree structures like B-trees emerged in the 1960s to allow direct access in logarithmic time. Hashing allows single access but does not work for dynamic files. Files are organized on disks in tracks divided into sectors or variable-sized blocks to improve performance and reduce fragmentation.

Uploaded by

Vinutha Mohan

0% found this document useful (0 votes)

71 views

Accessing The Data.: Dept. of Ise, Gsssietw

Uploaded by

Vinutha Mohan

You are on page 1/ 22

VI Sem -File Structures [17IS62 ]

MODULE - 1
The Heart of File Structure Design:
Disks (Magnetic disks/Optical disks) are slow. The time it takes to get information from
Random Access Memory (RAM) is about 120th billionths of second. Getting the same
information from a typical disk might take 30 milliseconds, or 30 thousandths of a second.
The disk access is quarter of a million times longer than the memory access. On the other
hand disk provides enormous capacity at much less cost than memory and they are non
volatile. The tension between a disk’s relatively slow access time and its enormous,
nonvolatile capacity is the driving force behind file structure design. Good file structure
design will give us access to all the data without making our application spend a lot of time
waiting for the disk.
File structure is a combination of representations for data in files and of operations for
accessing the data.
Short History of file structure design

1. Early work: Early work assumed that files were on tapes. Access was sequential and the
cost of access grew in direct proportion to the size of file.

2. Emergence of Disks and indexes: Sequential access was not a good solution for large
files. Disks allowed for direct access. Indexes made it possible to keep a list of keys and
address of records in a small file that could be searched very quickly. With the key and
address the user had direct access to the large primary file.

3. Emergence of Trees: As indexes also grew too large to they became difficult to handle.
Sorting the indexes took too much time and reduced the performances. Idea of using tree
structures to manage the indexes emerged in early 1960’s. Initially BST were used for storing
the records in the file. This resulted in uneven growth of trees in-turn resulted in long searches
require many disk access to find a record. Than AVL trees were used. AVL trees are balanced
trees. The problem of uneven growth was resolved. The AVL trees are suitable for data in
memory but not for data in file. In 1970s came the idea of B trees and B+ trees which require
an O (logk N) access time. Still the efficiency was dependent on the size of the file. i.e ‘N’. As
the ‘N’ (number of records increased) the efficiency decreases.

4. Hashing: Retrieving records in a single access to a file. Ideally Hashing has an efficiency
of O(1). Hashing is a good for files that do not change size greatly over time, but do not work
well with dynamic files. Extendible hashing helps in overcoming the limitation of Hashing.

Dept. of ISE, GSSSIETW Page 1

VI Sem -File Structures [17IS62 ]

Physical File and Logical File

A file that actually exists on secondary storage. It is the file as known by the computer OS
and that appear in its file directory. A collection of bytes stored on a disk or tape.

Logical file
The file as seen by a program. The use of logical file allows a program to describe operations
to be performed on a file without knowing what physical file will be used. A “channel”
(Telephone line) that hides the details of the file location and physical format to the program.
This logical file will have logical name which is what is used inside the program.

UNIX file APIs

C Library functions C+
+ library functions

SECONDARY STORAGE AND SYSTEM SOFTWARE

Disks belong to direct access storage devices because they make it possible to access the data
directly. Different types of disks are:
Hard disk: High capacity + Low cost per bit
Floppy disk: cheap, slow and holds very limited data.
Optical disk: Read only, holds more data and can be reproduced cheaply, it is slow.
Organization of Disks
The information stored on a disk is stored on the surface of one or more platters. Disks drives
typically have a number of platters. The platters spin at around 5000 rpm. Platters contain
contains concentric tracks on both the surface i.e top and bottom surface of platters. The
tracks that are directly above and below one another forms a cylinder. The significance of the
cylinder is that all the information on a single cylinder can be without moving the arm that
holds the read write head. Moving the arm is called seeking and is the slowest part of
accessing information from a disk. Each track is divided into number of sectors. A sector is
smallest addressable unit of the disk. When a program reads a byte from the disk, the
operating system locates the surface, track and sector containing that byte, the arm assembly
is moved in or out to position a head on a desired track and reads the entire sector into a
special area in main memory called buffer.
VI Sem -File Structures [17IS62 ]

The bottleneck of a disk access is moving the read/write arm. So it makes sense to store a file
in tracks that are below/above each other on different surfaces, rather than in several tracks
on the same surface. Disk controllers: typically embedded in the disk drive, which acts as an
interface between the CPU and the disk hardware. The controller has an internal cache
(typically a number of MBs) that it uses to buffer data for read/write requests.

Estimating Capacities
Track capacity = number of sectors/track * bytes/sector
Cylinder capacity = number of tracks/cylinder * track capacity
Drive capacity = number of cylinders * cylinder capacity
Number of cylinders = number of tracks in a surface
Organizing Tracks by sector
Two ways to organize data on disk: by sector and by block. The physical placement of
sectors- Different views of sectors on a track:
Sectors that are adjacent, fixed size segments of a track that happen to hold a file. When you
want to read a series of sectors that are all in the same track, one right after the other, you
often cannot read adjacent sectors. After reading the data, it takes the disk controller a certain
amount of time to process the received information before it is ready to accept more. if
logically adjacent sectors are placed physically adjacent, we would miss start of the next
sector while we were processing the sector that we had just read. w.r.t the given figure it
takes thirty-two revolutions to read the entire 32 sectors of a track.
VI Sem -File Structures [17IS62 ]

O/O system designers have solved this problem by interleaving the sectors: leaving an
interval of several physical sectors between logically adjacent sectors. Figure below
illustrates the assignment of logical sector content to the thirty-two physical sectors in a track
with interleaving factor of 5. It takes five revolutions to read the entire 32 sectors of a track.

In the early 1990s, controller speeds improved so that disks can now offer 1:1 interleaving.
This means that successive sectors are physically adjacent, making it possible to read entire
track in a single rotation of the disk.
Clusters
Another view of sector organization, designed to improve performance, is clusters. A cluster
is a fixed number of contagious sectors. Once a given cluster has been found on a disk, all
sectors in that cluster can be accessed without requiring additional seek.
To view a file as a series of clusters and still maintain the sectored view the file manager ties
logical sectors to physical clusters they belong to by using a file allocation table (FAT). The
FAT contains a list of all the clusters in a file. ordered accorind to the logical order of the
sectors they contain. With each cluster entry in the FAT is an entry giving the physical
location of the cluster.
VI Sem -File Structures [17IS62 ]

Extents
In case of availability of sufficient free space on a disk, it may be possible to store a file
entirely in contagious clusters. Then we say that the file consists of one extent. All of its
sectors, tracks and (if it is large) cylinders form one contagious whole. Then whole file can be
accessed with minimum amount of seek.

File with extent 1(one)

File with extent 3

In case of non availability of enough contagious space to store entire file, the file is divided
into two or more contagious parts. Each part is an extent. As the number of extents increase
for a file, the file becomes more spread out on the disk, and the amount of seeking increases.
Fragmentation
All of the sectors on a given disk contain same number of bytes. There are two possible
organizations for records: (if the records are smaller than the sector size)
1. Store one record per sector.
Advantage: each record can be retrieved by reading one sector.
Disadvantage: Loss of space within each sector. (Internal Fragmentation)
2. Store records successively (i.e one record may span across two sectors)
Advantage: No Internal fragmentation
Disadvantage: Two sectors may need to be accessed to retrieve a single record.

1 record per sector

Record spanning across sectors

VI Sem -File Structures [17IS62 ]

Organizing Tracks by blocks

Rather than dividing tracks into sector, the tracks can be divided into blocks whose size can
vary. Blocks can be either fixed or variable in length, depending on the requirements of the
file structure designer and the capabilities of Operating system.
Block don’t have record spanning across sectors and fragmentation problem of sectors, since
they vary in size to fit the logical organization of data. The term blocking factor indicates the
number of records that can be stored in each block.

Each block is usually accompanied by sub blocks containing extra information about the data
block such as:
1. Count sub-block: Contains number of bytes in accompanying data block.
2. Key sub-block: contains the keys of all the records that are stored in the
following data block.

Non Data Overhead

Both block and sectors require that a certain amount of space be taken up on the disk in the
form of non data overhead. Some of the overhead consists of information that is stored on the
disk during preformatting.
On sector addressable disks, preformatting involves storing (at the beginning of each sector)
 Sector address.
 Track address.
 Condition of the sector (bad sector or usable).
On block organised disks:
 Sub block.
 Inter block gaps. (as shown in the above figure)
Relative to sector addressable disks block addressable disks have more non data overhead.
VI Sem -File Structures [17IS62 ]

Disks as Bottleneck
Processes are often disk bound. i.e the cpu often has to wait long period of time for the disk
to transmit data. Then the cpu processes the data. Solution to handle disk bottleneck are:
Solution 1: Multi-programming (CPU works on other jobs while waiting for the disk)
Solution 2: Stripping: Disk stripping involves splitting the parts of a file and storing on
several different drives, then letting the separate drives deliver parts of the file to CPU
simultaneously(It achieves parallelism)
Solution 3: RAID: Redundant array of independent disks
Solution 4: RAM disks: Simulate the behaviour of mechanical disk in main memory
(provides faster access)
Solution 5: Disk cache: Large block of main memory configured to contain pages of data
from a disk. First check cache for required data if not available, then go to disk and replace
some page in cache with the page from the disk containing the required data.
The cost of disk Access
Seek Time: is the time required to move the access arm to the correct cylinder. if we are
alternately accessing sectors from two files that are stored at the opposite extremes on a disk
(one on the inner most cylinder, one on the outer most cylinder), seeking is very expensive.
Most hard disks available today have average seek time of less than 10 milli seconds and
high performance hard disks have average seek time as low as 7.5 msecs
Rotational Delay: refers to the time it takes for the disk to rotate so the sector we want is
under the read/write head. Hard disk with rotation speed of 5000 rpm takes 12 msecs for one
rotation. on average, the rotation delay is half a revolution, or 6 msec.
Transfer time: Once the data we want is under the read/write head, it can be transferred. The
transfer time is given by the formula
AGNETIC TAPES
Magnetic tape units belong to a class of devices that provide no direct accessing facility but
can provide very rapid sequential access to data. Tapes are compact stand up well under
different environment conditions, easy to store and transport. Tapes are widely used to store
application data. Currently tapes are used as archival storage.

Organization Of Data On Nine Track Tape:

Since tapes are accessed sequentially there is no need for addresses to identify location of data
on tape. ON a tape logical position of a byte within a file corresponds directly to its physical
position relative to the start of the file.
VI Sem -File Structures [17IS62 ]

The surface of the typical tape can be seen as a set of parallel tracks, each of which is a
sequence of bits. In a nine track tape the nine bits that are at the corresponding position in the
nine respective tracks are taken to constitute one byte, plus a parity bit. So a byte can be
thought of as a one bit wide slice of tape. Such a slice is called frame. Parity bit is not part of
the data. It is used to check the validity of the data.
Frame

Track 1

Track 2

Track 9

Data Block Inter Block Gap

Frames are organised into data blocks of variable size separated by interblock gaps (Long
enough to start/accelerate and stop/decelerate). Tapes cannot start and stop instantaneously.
Length of the magnetic tape is given by - s
s = n x (b+g) n=number of data blocks, g=inter block gap, b=physical length of data block

Nominal transmission rate = tape density (bpi) x tape speed (ips)

VI Sem -File Structures [17IS62 ]

Effective transmission rate = effective recording density (bpi) x tape speed (ips) For problem
related to Magnetic Tapes refer to Class notes.
Disk versus Tapes
In past: Both disk and tapes were used foe secondary storage. Disks were preferred for
random access and tape for better sequential access.
Now: Disks have taken over much of secondary storage because of decreased cost of disk.
Tapes are used as tertiary storage.
INTRODUCTION TO CD-ROM
CD-ROM: Compact disk read only memory. A single disk can hold approximately 700MB of
data. CD-ROM is read only. It is publishing medium rather than a data storage and retrieval
like magnetic disks.
Physical Organization of CD-ROM
CD-ROM is the child of CD audio. Audio discs are designed to play music, not to provide
fast, random access to data. This biases CD toward having storage capacity and moderate
data transfer rates and against decent seek performance.
Reading Pits and Lands:
CD ROMs are stamped from a glass master disk which has a coating that is changed by the
laser beam. When the coating is developed, the areas hit by the laser beam turn into pits along
the track followed by the beam. The smooth unchanged areas between the pits are called
lands.
When we read the CD we focus a beam of laser light on the track as it makes under the
optical pickup (read/write head). The pits scatter the light but the lands reflect the light back
to optical pickup. High and low intensity reflected light is the signal used to reconstruct the
original digital information.
1’s are represented by transition from pit to land and back again. Every time the light
intensity changes we get 1. The 0s (Zeros) are represented by the amount of time between the
transitions. The longer between transitions, the more 0s (Zeros) we have.
Given this scheme it is not possible to have two adjacent 1’s – 1’s are always separated by
0s.In face due to the limits of the resolution of the optical pickup, there must be at least two
0s between any pair of 1s. This means that the raw patterns of 8 bits 1s and 0s has to be
translated so that at least 2 0s (zeros) separate consecutive 1s. This translation is done using
EFM (Eight to Fourteen Modulation) encoding lookup table. The EFM transition scheme
refers lookup table, turns the original 8 bits of data into 14 expanded bits that can be
represented in the pits and lands on the disc.
VI Sem -File Structures [17IS62 ]

Constant Linear Velocity and Constant Angular Velocity

CLV CAV
The data is stored on a single spiral track that The data is stored on ‘n’ number of
winds for almost 3 miles from the centre to concentric tracks and pie shaped sectors.
the outer edge of the disc.
All the sectors take same amount of space. Inner sectors take less space compared to
outer sectors.
Storage capacity of all the sectors is same. Storage capacity of all the sectors is same.
All the sectors are written at maximum Writes data less densely in outer sectors and
density (Constant data density) more densely at inner sectors. (Variable data
density)
Due to Constant data density space is not Due to Variable data density space is wasted
wasted in either inner or outer sectors. in outer sectors.
Constant data density implies that disc has to Variable data density implies that disc rotates
spin more slowly when reading data at outer at constant speed irrespective of reading from
sectors compared to reading at the inner inner sectors or outer sectors.
(center) sectors. ( Variable speed of disc
rotation)
Poor seeking performance Seeking is fast compared to CLV

Addressing
In CD audio each sector is of 2 Kilo Bytes. 75 sectors create 1 second of audio playback.
According to original Philips/Song standards, a CD whether used for audio or CD-ROM,
contains at least one hour of playing time. That means the disc is capable of holding at least
540000 kilo bytes of data.
1 second = 75 sectors
VI Sem -File Structures [17IS62 ]

1 minute = 4500 sectors

60 minutes - ? (60x4500 = 270000 (Sectors)
60 seconds = ? (60x75 = 4500 sectors)
270000 sectors x 2 kilo bytes = 540000 kilo byes => 540 mega bytes.
Sectors are addressed by min:sec:sector (example- 34th Sector of 10th second 16th minute is
addressed as 16:10:34)
Structure of a CD-ROM sector:

12 bytes 4 bytes 2048 bytes 4 bytes error 8 bytes 276 bytes error
Synch sector ID user data detection null correction

CD-ROM strength and Weakness

Seek Performance: Random access performance is very poor. Current magnetic disk
technology has an average random data access time of about 30 msecs whereas CD ROM
takes 500 msecs to even 1 second.
Data Transfer rate: Not terribly slow not very fast. It has modest transfer rate of 75 sectors
or 150 kilo bytes per second. Its about 5 times faster than transfer rate of floopy discs and an
order of magnitude slower than rate for good Winchester disks.
Storage Capacity: Holds approximately 700 MB of data. Large storage capacity for text
data. Enables us to build indexes and other support structures that cann help overcome some
of the limitations associated with CD-ROM’s poor seek performance.
Read-Only Access: CD- ROM is a publishing medium, a storage device that cannot be
changes after manufacture. This provide significant advantages such as:
a. We never have to worry about updating.
b. This simplifies the file structure and also optimizes our index structures and other
aspects of file organization.
Asymmetric reading and writing: For most media, files are written and read using the same
computer system. Often reading and writing are both interactive so there is a need to provide
quick response to the user. CD-ROM is different. We create the files to be placed on the disc
once, then we distribute the disc, and it is accessed thousands of times.

A JOURNEY OF A BYTE
What happens when the following statement in the application program is executed?
write(fd,ch,1)
VI Sem -File Structures [17IS62 ]

1. The program asks the operating system to write the contents of the variable c to the next
available position in the file.
2. The operating system passes the job on to the file manager.
3. The file manager looks up for the given file in a table containing information about it, such
as whether the file is open and available for use, what types of access are allowed, if any, and
what physical file the logical name fd corresponds to.
4. The file manager searches a file allocation table for the physical location of the sector that
is to contain the byte.
5. The file manager makes sure that the last sector in the file has been stored in a system I/O
buffer in RAM, than deposits the ‘P’ into its proper position in the buffer.

6. The file manager gives instructions to the I/O processor abou where the byte is stored in
RAM and where it needs to be sent on the disk.
7. The I/O processor finds a time when the drive is available to receive the data and puts the
data in proper format for the disk. It may also buffer the data to send it out in chunks of the
proper size for the disk.
8. The I/O processor sends the data to the disk controller.
9. The controller instructs the drive to move the read/write head to the proper track, waits for
the desired sector to come under the read/write head, than sends the byte to the drive to be
deposited bit by bit on the surface of the disk.
VI Sem -File Structures [17IS62 ]

BUFFER MANAGEMENT
Buffering involves working with a large chunk of data in memory so the number of accesses
to secondary storage can be reduced. Assume that the system has a single buffer and is
performing both input and output on one character at a time, alternatively. In this case, the
sector containing the character to be read is constantly over-written by the sector containing
the spot where the character will be written, and vice-versa. In such a case, the system needs
more than 1 buffer: at least, one for input and the other one for output. Strategies to avoid this
problem:
Multiple buffering:
Suppose that a program is only writing to a disk and that it is I/O bound. The CPU wants to
be filling a buffer at the same time that I/O is being performed. If two buffers are used and
I/O-CPU overlapping is permitted, the CPU can be filling one buffer while the contents of the
other are being transmitted to disk. When both tasks are finished, the roles of the buffers can
be exchanged. This method is called double buffering. This technique need not be restricted
to two buffers.

Some file system use a buffering scheme called buffer pooling. Buffer pooling:
There is a pool of buffers. When a request for a sector is received, O.S. first looks to see that
sector is in some buffer. If not there, it brings the sector to some free buffer. If no free buffer
exists, it must choose an occupied buffer. (Usually LRU strategy is used).
VI Sem -File Structures [17IS62 ]

Fundamental File Structure Concepts, Managing Files of Records

The different types of records are:
 Fixed Length Records.
 Variable Length Records.
Fixed length record: The length of all the records in the file is fixed i.e the length is
predefined and altogether the length of the record will not exceed the predefined length.
Variable length record: The length of records is not fixed. Each records are of different
length in the file.
Field & Record Organization
When we are building a file structures, we are making it possible to make data persistent.
That is, one program can store data from memory to a file, and terminate. Later, another
program can retrieve the data from the file, and process it in memory. In this chapter, we look
at file structures which can be used to organize the data within the file, and at the algorithms
which can be used to store and retrieve the data sequentially.
Field
A filed is the smallest logically meaningful unit of information in a file. A field is a logical
notion.
Field Structures:
There are many ways of adding structure to files to maintain the identity of fields. Four of the
most common methods are:
 Fix the length of fields.
 Begin each field with a length indicator.
 Separate the fields with delimiters.
 Use a “Keyword = Value” Expression to identify fields.
Method 1: Fix the length of fields.
In this method length of each field in a record is fixed. For example consider the figure
below:
10 bytes 8 bytes 5 Bytes 12 bytes 15 Bytes 20 Bytes
Ames Mary 123 Maple Stillwater OK74075
Mason Alan 90 Eastgate Ada OK74820

Here we have fixed the field 1 size as 10 bytes, field 2 as 8 bytes, field 3 as 5 bytes and so on
which results in total length of record to be 70 bytes. While reading the record the first 10
bytes that is read is treated as field 1 the next 8 bytes are treated as field 2 and so on.
VI Sem -File Structures [17IS62 ]

Disadvantage of this method is padding of each and every field to bring it to pre-defined
length which makes the file much larger. Rather than using 4 bytes to store “Ames” we are
using 10 bytes. We can also encounter problems with data that is too long to fit into the
allocated amount of space.
Method 2: Begin each field with a length indicator.
 The fields within a record are prefixed by a length byte or bytes.
 Fields within a record can have different sizes.
 Different records can have different length fields.
 Programs which access the record must know the size and format of the length prefix.
 There is external overhead for field separation equal to the size of the length prefix
per field.

04Ames04 Mary0312305Maple10Stillwater07OK74075
05Mason04Alan02 9008Eastgate03Ada07OK74820

Method 3: Separate the fields with delimiters.

 The fields within a record are followed by a delimiting byte or series of bytes.
 Fields within a record can have different sizes.
 Different records can have different length fields.
 Programs which access the fields must know the delimiter.
 The delimiter cannot occur within the data.
 If used with delimited records, the field delimiter must be different from the
record delimiter.
 Here the external overhead for field separation is equal to the size of the delimiter
per field.
 Here in the example we have used “|” as delimiter.
Ames|Mary|23|Maple|Stillwater|OK74075
Mason|Alan|90|Eastgate|Ada|OK74820

Method 4: Use a “Keyword = Value” Expression to identify fields.

 It is the structure in which a field provides information about itself (Self describing).
 It is easy to tell which fields are contained in a file.
 It is also good format for dealing with missing fields. If a field is missing, this
format makes it obvious, because the keyword is simply not there.
 This format wastes lot of space for storing keywords.
Lname=Ames | Fname=Mary | Address=123 Maple | City= Stillwater | State = OK |
Zip=74075 Lname=Mason | Fname=Alan | Address=90 Eastgate | City = Ada | State = OK |
Zip = 74820
VI Sem -File Structures [17IS62 ]

Record Structures:
A record can be defined as a set of fields that belong together when the file is viewed in terms
of a higher level of organization.
Following are some of the most often used methods for organizing the records of a file.
 Make records a predictable number of bytes.
 Make records a predictable number of fields.
 Begin each record with a length indicator.
 Use an index to keep track of the address.
 Place a delimiter at the end of each record.
Method 1: Make records a predictable number of bytes.
 All records within a file have the same size.
 Programs which access the file must know the record length.
 Offset, or position, of the nth record of a file can be calculated.
 There is no external overhead for record separation.
 There may be internal fragmentation (unused space within records.)
 There will be no external fragmentation (unused space outside of records) except
for deleted records.

 Individual records can always be updated in place.

 There are three ways of achieving this method. They are as shown below.

Ames Mary 123 Maple Stillwater OK74075

Mason Alan 90 Eastgate Ada OK74820
Fixed Length records with fixed Length fields

Ames|Mary|23|Maple|Stillwater|OK74075| Unused space

Mason|Alan|90|Eastgate|Ada|OK74820| Unused space

Fixed length records with variable length fields

Method 2: Make record a predictable number of fields.
 All the records in the a file contains fixed number of fields.
 In the figure below each record is of 6 fields.
VI Sem -File Structures [17IS62 ]

Ames|Mary|23|Maple|Stillwater|OK74075| Mason|Alan|90|Eastgate|Ada|OK74820| …

6 Fields 6 Fields
Method 3: Begin each record with a length indicator
 The records within a file are prefixed by a length byte or bytes.
 Records within a file can have different sizes.
 Different files can have different length records.
 Programs which access the file must know the size and format of the length prefix.
 Offset, or position, of the nth record of a file cannot be calculated.
 There is external overhead for record separation equal to the size of the length
prefix per record.
40Ames|Mary|23|Maple|Stillwater|OK74075|36 Mason|Alan|90|Eastgate|Ada|OK74820| …

Length of 1st record Length of 2nd record

(Length Indicator) (Length
Indicator) Method 4: Use an index to keep track of the
address.
 An auxiliary file can be used to point to the beginning of each record.
 In this case, the data records can be contiguous.
 Disadvantage: there is space overhead for the index file.
 Disadvantage: there is time overhead for the index file.
 The time overhead for accessing the index file can be minimized by reading the
entire index file into memory when the files are opened.

 Advantage: there will probably be no internal fragmentation.

 Advantage: the offset of each record is be contained in the index, and can be
looked up from its record number. This makes direct access possible.

Method 5: Place a delimiter at the end of each record.

 Records are terminated by a special character or sequence of characters called
VI Sem -File Structures [17IS62 ]

delimiter.
VI Sem -File Structures [17IS62 ]

 Programs which access the record must know the delimiter.

 The delimiter cannot be part of data.
 If used with delimited fields, the field delimiter must be different from the record
delimiter.
 Here the external overhead for field separation is equal to the size of the delimiter
per record.
 In the following figure we use “|” as field delimiter and “#” as record delimiter.

Ames|Mary|23|Maple|Stillwater|OK74075| #Mason|Alan|90|Eastgate|Ada|OK74820|# …

Using Inheritance for record buffer classes

One or more base classes define members and methods, which are then used by subclasses.
Our discussion in this section deals with fstream class which is embedded in a class hierarchy
that contains many other classes. The read operations, including the extraction operators are
defined in class istream. The write operations are defined in class ostream. Class iostream,
inherits from istream and ostream.
A Class Hierarchy for Record Buffer Objects
The members and methods that are common to all of the three buffer classes are included in
base class IOBuffer. Example the members in IOBuffer class like BufferSize, MaxBytes,
NextByte are inherited in its derived class VariableLengthBuffer and FixedLengthBuffer.
Similarly methods read(), Write(), Pack(), Unpak() of IOBuffer are inherited by derived
classes VariableLengthBuffer and FixedLengthBuffer. (Refer Text book for class definition).

IO Buffer
char array for buffer value

VariableLengthBuffer FixedLengthBuffer
read & write operations read & write operations

DelimitedFieldBuffer LengthFieldBuffer FixedFieldBuffer

VI Sem -File Structures [17IS62 ]

Here the member functions read(), Write(), Pack(), Unpak() of class IOBuffer are virtual
functions so that subclasses VariableLengthBuffer and FixedLengthBuffer define its own
implementation. This means that the class IOBuffer does not include an implementation of
the method.
Record Access
When looking for an individual record, it is convenient to identify the record with a key
based on record contents. The key should be unique so that duplicate entries can be avoided.
For example, in the previous section example we might want to access the “Ames record” or
the “Mason record” rather than thinking in terms of the “first record” or “second record”.
When we are looking for a record containing the last name Ames we want to recognize it
even if the user enters the key in the form “AMES”, “ames” or “Ames”. To do this we must
define a standard form of keys along with associated rules and procedures for converting
keys into this standard form. This is called as canonical form of the key.
Sequential Search
Reading through the file, record by record, looking for a record with a particular key is called
sequential searching. In general the work required to search sequentially for a record in a file
with ‘n’ records in proportional to n: i.e the sequential search is said to be of the order O(n).
This efficiency is tolerable if the searching is done on the date present in the main memory,
but not for, which has to be extracted from secondary storage device, due to high delay
involved in accessing the data. Instead of extracting the records from the secondary storage
device one at a time sequential we can access some set of records at once from the hard disk,
store it in main memory and do the comparisons. This is called as record blocking.
In some cases sequential search is superior like:
Repetitive hits: Searching for patterns in ASCII files.
Searching records with a certain secondary key value.
Small Search Set: Processing files with few records.
Devices/media most hospitable to sequential access: tape, binary file on disk.
Unix Tools For Sequential Processing
Some of the UNIX commands which perform sequential access are:
Cat: displays the content of the file sequentially on the console.
%cat filename
Example: %cat myfile
Ames Mary 123 Maple Stillwater OK74075
Mason Alan 90 Eastgate Ada OK74820
VI Sem -File Structures [17IS62 ]

wc: counts the number of words lines and characters in the file
%wc filename
Example: %wc
myfile 2 14
76
grep: (generalized regular expression) Used for pattern matching
%grep string filename
Example: % grep Ada myfile
Mason Alan 90 Eastgate Ada OK74820
Direct Access:
The most radical alternative to searching sequentially through a file for a record is a retrieval
mechanism known as direct access. The major problem with direct access is knowing where
the beginning of the required record is. One way to know the beginning of the required record
or byte offset of the required record is maintaining a separate index file. The other way is by
relative record number RRN. If a file is a sequence of records, the RRN of a record gives its
position relative to the beginning of the file. The first record in a file has RRN 0, the next has
RRN 1, and so forth.
For example if we are interested in the record with an RRN of 546 and our file has fixed
length record size of 128 bytes per record, we can calculate the byte offset of a record with
an RRN of n is
Byte offset = 546 * 128 = 69888
In general Byte offset = n * r where r is length of record.
Header Records
It is often necessary or useful to keep track of some general information about a file to assist
in future use of the file. A header record is often placed at the beginning of the file to hold
information such as number of records, type of records the file contains, size of the file, date
and time of file creation, modification etc. Header records make a file self describing object,
freeing the software that accesses the file from having to know a priori everything about its
structure. The header record usually has a different structure than the data record.
File Access and File Organization
 File organization is static.
 Fixed Length Records.
 Variable Length Records.
File access is dynamic.
 Sequential Access
 Direct Access.
VI Sem -File Structures [17IS62 ]

Exam 1 Sol
No ratings yet
Exam 1 Sol
8 pages
Master's Thesis On NTFS
No ratings yet
Master's Thesis On NTFS
40 pages
Accessing The Data.: Amogh P K
No ratings yet
Accessing The Data.: Amogh P K
21 pages
Chapter 3 Secondary Storage and System Software
No ratings yet
Chapter 3 Secondary Storage and System Software
24 pages
Chapter 3
No ratings yet
Chapter 3
18 pages
History of File Structures
No ratings yet
History of File Structures
26 pages
Unit 4 Information and File Mgmt
No ratings yet
Unit 4 Information and File Mgmt
42 pages
File Design Alternatives
100% (1)
File Design Alternatives
5 pages
1725861980266 OS Unit-IV - File Organization and Disk Scheduling
No ratings yet
1725861980266 OS Unit-IV - File Organization and Disk Scheduling
42 pages
Unit 4 Information and File Mgmt
No ratings yet
Unit 4 Information and File Mgmt
42 pages
Os Lesson 3 File Management
No ratings yet
Os Lesson 3 File Management
9 pages
A Directory Is A Container That Is Used To Contain Folders and File. It Organizes Files and Folders Into A Hierarchical Manner
No ratings yet
A Directory Is A Container That Is Used To Contain Folders and File. It Organizes Files and Folders Into A Hierarchical Manner
24 pages
Lecture Notes Course Outcome 1 & Session 4 Topic: SFS File System Implementation
No ratings yet
Lecture Notes Course Outcome 1 & Session 4 Topic: SFS File System Implementation
8 pages
Ch4-Data Storage and Indexing
No ratings yet
Ch4-Data Storage and Indexing
116 pages
FILE CONCEPT for second internels
No ratings yet
FILE CONCEPT for second internels
20 pages
UNIT-4 & 5 File System & Secondary Storage Structures
No ratings yet
UNIT-4 & 5 File System & Secondary Storage Structures
68 pages
ch 3_62e9980fbd9a1022023af5af9800e9e4
No ratings yet
ch 3_62e9980fbd9a1022023af5af9800e9e4
47 pages
Basicinfo HDD
No ratings yet
Basicinfo HDD
27 pages
Basic Concepts of Hard Disks
100% (3)
Basic Concepts of Hard Disks
29 pages
Secondary Storage
No ratings yet
Secondary Storage
50 pages
OS Unit-4
No ratings yet
OS Unit-4
29 pages
Unit 3 Record Storage and Primary File Organization: Structure
No ratings yet
Unit 3 Record Storage and Primary File Organization: Structure
25 pages
Module 03 - Understanding Hard Disks and File Systems - AG-25-1
No ratings yet
Module 03 - Understanding Hard Disks and File Systems - AG-25-1
63 pages
File Management15
No ratings yet
File Management15
48 pages
CSE 4600 Operating Systems Homework Assignment 9: File Systems
No ratings yet
CSE 4600 Operating Systems Homework Assignment 9: File Systems
4 pages
File System Interface and Operations
No ratings yet
File System Interface and Operations
30 pages
File Management
No ratings yet
File Management
18 pages
Unit 2 - Operating System
No ratings yet
Unit 2 - Operating System
21 pages
Updated Unit 5 DBMS Notes
No ratings yet
Updated Unit 5 DBMS Notes
138 pages
OS 5TH.pptx
No ratings yet
OS 5TH.pptx
38 pages
Unit Iv Os
No ratings yet
Unit Iv Os
22 pages
File System
No ratings yet
File System
18 pages
Files Organization and Access Mechanism
No ratings yet
Files Organization and Access Mechanism
3 pages
Unit 2 Data Structures, File Organisation and Physical Database Design
No ratings yet
Unit 2 Data Structures, File Organisation and Physical Database Design
13 pages
DBMS UNIT-4-1
No ratings yet
DBMS UNIT-4-1
45 pages
Sit Ani A
No ratings yet
Sit Ani A
26 pages
LINUX UNIT II_231102_091614
No ratings yet
LINUX UNIT II_231102_091614
34 pages
Logical Structure of Hard Disk and Its Organization
No ratings yet
Logical Structure of Hard Disk and Its Organization
7 pages
BHIKARI SALE
No ratings yet
BHIKARI SALE
9 pages
6.File Managment
No ratings yet
6.File Managment
7 pages
Secondary Storage, Sometimes Called Auxiliary Storage, Is Storage That Is Separate
No ratings yet
Secondary Storage, Sometimes Called Auxiliary Storage, Is Storage That Is Separate
3 pages
File System Case Study
No ratings yet
File System Case Study
23 pages
CHAPTER_NO_VI
No ratings yet
CHAPTER_NO_VI
15 pages
File Management: Objectives
No ratings yet
File Management: Objectives
7 pages
File MGNT
No ratings yet
File MGNT
8 pages
File System Cse316
No ratings yet
File System Cse316
8 pages
OS Module V
No ratings yet
OS Module V
57 pages
File Design
No ratings yet
File Design
29 pages
OS CHAPTER-11_File Management[1]
No ratings yet
OS CHAPTER-11_File Management[1]
37 pages
Linux File System: PRAKHER GUPTA (144032) SHISHIR (144045)
No ratings yet
Linux File System: PRAKHER GUPTA (144032) SHISHIR (144045)
37 pages
File Organisation
No ratings yet
File Organisation
7 pages
Operating System File System: Session 10 (Chapter 9)
No ratings yet
Operating System File System: Session 10 (Chapter 9)
50 pages
File System - LT Tran Y Son
No ratings yet
File System - LT Tran Y Son
4 pages
Operating System Assingment 1
No ratings yet
Operating System Assingment 1
8 pages
Secondary Storage Devices (1) :: Magnetic Disks
No ratings yet
Secondary Storage Devices (1) :: Magnetic Disks
56 pages
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
From Everand
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
Michael W. Lucas
No ratings yet
Beginner's Guide for Cybercrime Investigators
From Everand
Beginner's Guide for Cybercrime Investigators
Nicolae Sfetcu
5/5 (1)
OpenBSD Mastery: Filesystems: IT Mastery, #19
From Everand
OpenBSD Mastery: Filesystems: IT Mastery, #19
Michael W. Lucas
No ratings yet
Computer Science I Essentials
From Everand
Computer Science I Essentials
Randall Raus
5/5 (7)
Best Free Open Source Data Recovery Apps for Mac OS English Edition
From Everand
Best Free Open Source Data Recovery Apps for Mac OS English Edition
Cyber Jannah Sakura
No ratings yet
“Information Systems Unraveled: Exploring the Core Concepts”: GoodMan, #1
From Everand
“Information Systems Unraveled: Exploring the Core Concepts”: GoodMan, #1
Patrick Mukosha
No ratings yet
Module - 3.3
No ratings yet
Module - 3.3
47 pages
CKD With Recommendation of Suitable Diet Plan
No ratings yet
CKD With Recommendation of Suitable Diet Plan
4 pages
Yuva Dasara 2017
No ratings yet
Yuva Dasara 2017
1 page
Biometrics and Fingerprint Payment Technology
No ratings yet
Biometrics and Fingerprint Payment Technology
4 pages
Modern Operating Systems - Midterm Exam Solutions - Spring 2013
No ratings yet
Modern Operating Systems - Midterm Exam Solutions - Spring 2013
10 pages
Memory Hierarchy
No ratings yet
Memory Hierarchy
43 pages
Unit 6 - Secondary Storage Structures Unit 6
No ratings yet
Unit 6 - Secondary Storage Structures Unit 6
24 pages
ICS 431-Ch11-Mass Storage Structure
No ratings yet
ICS 431-Ch11-Mass Storage Structure
42 pages
Oem Functional Specifications For DVAS-2810 (810MB) 2.5-Inch Hard Disk Drive With SCSI Interface Rev. (1.0)
No ratings yet
Oem Functional Specifications For DVAS-2810 (810MB) 2.5-Inch Hard Disk Drive With SCSI Interface Rev. (1.0)
43 pages
Secondary Storage Structure
No ratings yet
Secondary Storage Structure
21 pages
C
100% (1)
C
392 pages
Disk Scheduling Algorithms
No ratings yet
Disk Scheduling Algorithms
8 pages
Disk Shuduling PDF
No ratings yet
Disk Shuduling PDF
14 pages
Project Report Computer Hardware Networking Mass Infotech (Cedti), Yamuna Nagar (Hariyana)
No ratings yet
Project Report Computer Hardware Networking Mass Infotech (Cedti), Yamuna Nagar (Hariyana)
151 pages
Lab Questions
No ratings yet
Lab Questions
35 pages
IBM H3256-A3 ATA Hard Disk Drive Product Manual & Specifications
No ratings yet
IBM H3256-A3 ATA Hard Disk Drive Product Manual & Specifications
50 pages
Disk Scheduling Algorithms
100% (1)
Disk Scheduling Algorithms
8 pages
Chapter 13 PDF
No ratings yet
Chapter 13 PDF
9 pages
Unit 5 1 Cache Performance V 2
No ratings yet
Unit 5 1 Cache Performance V 2
29 pages
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
No ratings yet
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
16 pages
18CS822 - SAN - Module 1
No ratings yet
18CS822 - SAN - Module 1
31 pages
Chapter 3 Third Edition
No ratings yet
Chapter 3 Third Edition
41 pages
Unit Iv - Os - Operating System Notes Unit Iv - Os - Operating System Notes
No ratings yet
Unit Iv - Os - Operating System Notes Unit Iv - Os - Operating System Notes
32 pages
Coa Set 1 Practicals
No ratings yet
Coa Set 1 Practicals
22 pages
Chapter 5 Exercises and Answers: Answers Are in Blue
No ratings yet
Chapter 5 Exercises and Answers: Answers Are in Blue
5 pages
MAXTOR DiamondMax Plus 9 160GB ATA Manual
No ratings yet
MAXTOR DiamondMax Plus 9 160GB ATA Manual
79 pages
HCIA 4.5 Dump
No ratings yet
HCIA 4.5 Dump
75 pages
Manual Notebook HP 240 g3
No ratings yet
Manual Notebook HP 240 g3
31 pages
Nettur Technical Training Foundation Diploma in Computer Engineering - CP08 PC Hardware
No ratings yet
Nettur Technical Training Foundation Diploma in Computer Engineering - CP08 PC Hardware
72 pages
Unit 10
No ratings yet
Unit 10
1 page
Disk Scheduling (Scan - Algorithm)
No ratings yet
Disk Scheduling (Scan - Algorithm)
9 pages
Exercise 2 - Assignment 2
No ratings yet
Exercise 2 - Assignment 2
3 pages
Input Output Interface: I/O Device Examples
No ratings yet
Input Output Interface: I/O Device Examples
7 pages