Unit-5 File Management
Unit-5 File Management
Unit-5 File Management
1
Contents
• File Overview: File Naming, File Structure, File Types, File Access, File
Attributes, File Operations, Single Level, two Level and Hierarchical
Directory Systems, File System Layout.
• Implementing Files: Contiguous allocation, Linked List Allocation, Linked
List Allocation using Table in Memory, I-nodes.
• Directory Operations, Path Names, Directory Implementation, Shared Files
• Free Space Management: Bitmaps, Linked List
2
Introduction
• A file is a collection of correlated information which is recorded on secondary or non-
volatile storage like magnetic disks, optical disks, and tapes.
• It is a method of data collection that is used as a medium for giving input and
receiving output from that program.
• Three essential requirements for long-term information storage:
1. It must be possible to store a very large amount of information.
2. The information must survive the termination of the process using it.
3. Multiple processes must be able to access the information at once.
• Files are logical units of information created by processes. A disk will usually contain
thousands or even millions of them, each one independent of the others.
• A file system is a method of organizing files on physical media, such as hard disks,
CD's, and flash drives.
• Information stored in files must be persistent, that is, not be affected by process
creation and termination. 3
File Naming
Figure 5-2. Three kinds of files. (a) Byte sequence. (b) Record sequence.(c) Tree.
5
• The file in Fig. 5-2(a) is an unstructured sequence of bytes. In effect, the
operating system does not know or care what is in the file. All it sees are
bytes.
• Both UNIX and Windows use this approach.
• Fig. 5-2(b), a file is a sequence of fixed-length records, each with some
internal structure.
• No current general-purpose system uses this model as its primary file system
any more, but back in the days of 80-column punched cards and 132-character
line printer paper this was a common model on mainframe computers.
• Fig. 5-2(c), a file consists of a tree of records, not necessarily all the same
length, each containing a key field in a fixed position in the record. The tree is
sorted on the key field, to allow rapid searching for a particular key.
• It is used on some large mainframe computers for commercial data
processing. 6
File Types
• Many operating systems support several types of files. UNIX and Windows, for
example, have regular files and directories.
• Regular files are the ones that contain user information. Application programs are
responsible for understanding the structure and content of any specific regular file. All
the files of Fig. 5-2 are regular files. Eg. word file, excel file etc.
• Regular files are generally either ASCII files or binary files(eg. executable files,
compiled programs, spreadsheets, graphics(image) files etc.). ASCII files consist of
lines of text. eg. c/c++, perl, HTML
• Directories are system files for maintaining the structure of the file system.
• Character special files are related to input/output and used to model serial I/O
devices, such as terminals, printers, and networks.
• Block special files it is type of device file 1 block at a time(1 block=512 bytes to 32
kb) ,are used to model disks.
7
• For example, in Fig. 5-3(a) we see a simple executable binary file taken from an
early version of UNIX. Although technically the file is just a sequence of bytes, the
operating system will execute a file only if it has the proper format.
9
File Access
• File access is a process that determines the way that files are accessed and read into
memory.
• Sequential Access, records are accessed in a certain pre-defined sequence,
information stored in the file is also processed one by one . eg. magnetic tape.
operations: read next, write next, rest/rewind
• Random Access is also called direct random access. This method allow accessing
the record directly. Each record has its own address on which can be directly accessed
for reading and writing. User now says "read n" rather than "read next".
• Indexed Sequential Access It is built on top of Sequential access. It uses an Index to
control the pointer while accessing files.
• Random access files are essential for many applications, for example, database
systems.
• If an airline customer calls up and wants to reserve a seat on a particular flight, the
reservation program must be able to access the record for that flight without having to
10
read the records for thousands of other flights first.
File Attributes
• A file has a name and data. Moreover, it also stores meta information like file creation
date and time, location, current size, last modified date, etc. All this information is
called the attributes of a file system.
13
An Example Program Using File-System Calls
14
Figure 4-5. A simple program to copy a file.
15
File-system implementation
• From now, we have only view the file system from the users point of view.
• Now it's time to view the file system from the implementor point of view.
• File system implementors are interested in how files and directories are
stored, how disk space for file system is managed, and how to make
everything work efficiently and reliably.
File-System Layout
• File systems are stored on disks. Most disks can be divided up into one or
more partitions, with independent file systems on each partition. Sector 0 of
the disk is called the MBR (master boot record) and is used to boot the
computer.
• The partition table gives the starting and ending addresses of each partition
of the disk. This table gives the starting and ending addresses of each
partition. 16
• Whenever the computer system is booted up, the bios reads in and executes
the master boot record(MBR).
• The very first thing that the master boot record program does is, locate the
active partition, read in its first block, that is called as the boot block and
execute it.
• The superblock contains all the key parameters about the file system and is
read into memory when the computer is booted or the file system is first
touched.
Figure 4-10. (a) Contiguous allocation of disk space for 7 files. (b) The state of the disk after files D and F have been
18
removed.
• Contiguous allocation requires that each file occupy a set of contiguous
blocks on the disk. The word contiguous means continuous.
Advantages:
• In the contiguous allocation, sequential and direct access both are supported.
• This is very fast and the number of seeks is minimal in the contiguous
allocation method.
Disadvantages:
• Contiguous allocation method suffers internal as well as external
fragmentation.
• In terms of memory utilization, this method is inefficient.
• It is difficult to increase the file size because it depends on the availability of
contiguous memory.
19
Linked-List Allocation
• Linked List allocation solves all problems of contiguous allocation. In
linked list allocation, each file is considered as the linked list of disk blocks.
• Each disk block allocated to a file contains a pointer which points to the
next disk block allocated to the same file.
21
Linked-List Allocation Using a Table in Memory
23
I-nodes (index-node)
• To keeping track of which blocks belong to which file is to associate
with each file a data structure called an i-node (index-node), which lists
the attributes and disk addresses of the file’s blocks.
• I-node is a data structure which is used
to identify which block belongs to
which file.
• It contains the attributes and disk
addresses of the file's blocks. Unlike
the in-memory table the i-node need to
be in memory only when the
corresponding file is open.
• Extra disk blocks may be reserved to
Figure 4-13. An example i-node.
store the block addresses of a large file. 24
Directories
Create: A directory is created. It is empty except for dot and dotdot, which
are put there automatically by the system.
Delete: A directory is delete. Here, only those directory can be deleted which
are empty.
Opendir: Directories can be read. But before reading any directory, it must be
opened first.
Therefore to list all the files present in a directory, a listing program opens that
required directory to read out the name of all files that this directory contains.
Closedir: Directory should be closed just to free up the internal table space
when it has been read.
Readdir: This call returns the next entry in an open directory.
28
Rename: Directory can also be renamed just like the files.
Link: Linking is a technique that allows a file to appear in more than one
directory.
Unlink: A directory entry is removed.
29
Path Names
• When the file system is organized as a directory tree, some way is needed
for specifying file names.
• Two different methods are commonly used.
1. Absolute path name
2. Relative path name
• An absolute path name consisting of the path from the root directory to the
file. As an example, the path /usr/ast/mailbox means that the root directory
contains a subdirectory usr, which in turn contains a subdirectory ast, which
contains the file mailbox.
• Absolute path names always start at the root directory and are unique.
Windows: \usr\ast\mailbox
UNIX : /usr/ast/mailbox
MULTICS : >usr>ast>mailbox 30
• The relative path name is used in
conjunction with the concept of the
working directory (also called the
current directory).
• A user can designate one directory as
the current working directory in
which case all the path names not
beginning at the root directory are
taken relative to the working
directory.
• For example, if the current working
directory is /usr/ast, then the file
whose absolute path is
Figure 4-8. A UNIX directory tree.
/usr/ast/mailbox can be referenced
simply as mailbox. 31
Implementing Directories
(a) A simple directory fixed size entries disk addresses and attributes in
directory entry
(b) Directory in which each entry just refers to an i-node
Figure 4-14. (a) A simple directory containing fixed-size entries with the disk addresses and attributes in
the directory entry. (b) A directory in which each entry just refers to an i-node.
32
Figure 4-15. Two ways of handling long file names in a directory. (a) In-line.(b) In a heap.
33
• One alternative is to give up the idea that all directory entries are the same
size.
• In-line method, each directory entry contains a fixed portion, typically
starting with the length of the entry, and then followed by data with a fixed
format, usually including the owner, creation time, protection information,
and other attributes.
• Each file name is terminated by a special character:
• Usually 0;
• Represented in the figure by a box with a cross in it ⊠
• A disadvantage of this method is that when a file is removed, a variable-
sized gap is introduced into the directory into which the next file to be
entered may not fit.
34
• Another way to handle variable-length names is to make the directory
entries themselves all fixed length and keep the file names together in a heap
at the end of the directory,
• This method has the advantage that when an entry is removed, the next file
entered will always fit there.
• the heap must be managed and page faults can still occur while processing
file names.
35
Shared Files
Figure 4-17. (a) Situation prior to linking. (b) After the link is created. (c) After the original owner removes the file.
37
• In the first solution, disk blocks are not listed in directories, but in a little
data structure associated with the file itself. The directories would then
point just to the little data structure. This is the approach used in UNIX
(where the little data structure is the i-node).
• In the second solution, B links to one of C’s files by having the system
create a new file, of type LINK, and entering that file in B’s directory.
• The new file contains just the path name of the file to which it is linked.
When B reads from the linked file, the operating system sees that the file
being read from is of type LINK, looks up the name of the file, and reads
that file.(symbolic linking, to contrast it with traditional (hard) linking).
38
Disk-Space Management
• All the files are normally stored on disk one of the main concerns of file system is
management of disk space.
Block Size
• The main question that arises while storing files in a fixed-size blocks is the size of
the block.
• If the block is too large space gets wasted and if the block is too small time gets
waste. So, to choose a correct block size some information about the file-size
distribution is required.
Keeping track of free blocks
• After a block size has been finalized the next issue that needs to be catered is how
to keep track of the free blocks. In order to keep track there are two methods that
are widely used:
• Using a linked list: Using a linked list of disk blocks with each block holding as
39
many free disk block numbers as will fit.
Figure. Storing the free list on a linked list.
40
• Using a Bitmap: A disk with n blocks has a bitmap
with n bits. Free blocks are represented using 1's and
allocated blocks as 0's
41