File System Structure
File System Structure
Most of the Operating Systems use layering approach for every task including file systems.
Every layer of the file system is responsible for some activities.
The image shown below, elaborates how the file system is divided in different layers, and
also the functionality of each layer.
o When an application program asks for a file, the first request is directed to the
logical file system. The logical file system contains the Meta data of the file and
directory structure. If the application program doesn't have the required
permissions of the file then this layer will throw an error. Logical file systems also
verify the path to the file.
o Generally, files are divided into various logical blocks. Files are to be stored in
the hard disk and to be retrieved from the hard disk. Hard disk is divided into
various tracks and sectors. Therefore, in order to store and retrieve the files, the
logical blocks need to be mapped to physical blocks. This mapping is done by
File organization module. It is also responsible for free space management.
o Once File organization module decided which physical block the application
program needs, it passes this information to basic file system. The basic file
system is responsible for issuing the commands to I/O control in order to fetch
those blocks.
o I/O controls contain the codes by using which it can access hard disk. These
codes are known as device drivers. I/O controls are also responsible for handling
interrupts.
Different Types of File Systems
There are several types of file systems, each designed for specific purposes
and compatible with different operating systems. Some common file system
types include:
FAT32 (File Allocation Table 32): Commonly used in older versions of
Windows and compatible with various operating systems.
NTFS (New Technology File System): Used in modern Windows
operating systems, offering improved performance, reliability, and security
features.
ext4 (Fourth Extended File System): Used in Linux distributions,
providing features such as journaling, large file support, and extended file
attributes.
HFS+ (Hierarchical File System Plus): Used in macOS systems prior to
macOS High Sierra, offering support for journaling and case-insensitive
file names.
APFS (Apple File System): Introduced in macOS High Sierra and the
default file system for macOS and iOS devices, featuring enhanced
performance, security, and snapshot capabilities.
ZFS (Zettabyte File System): A high-performance file system known for
its advanced features, including data integrity, volume management, and
efficient snapshots.
There are various on disk data structures that are used to implement a file system. This
structure may vary depending upon the operating system.
Volume control block all the information regarding that volume such as number of
blocks, size of each block, partition table, pointers to free blocks and free FCB
blocks. In UNIX file system, it is known as super block. In NTFS, this information is
stored inside master file table.
A directory structure (per file system) contains file names and pointers to
corresponding FCBs. In UNIX, it includes inode numbers associated to file names.
File Control block contains all the details about the file such as ownership details,
permission details, file size,etc. In UFS, this detail is stored in inode. In NTFS, this
information is stored inside master file table as a relational database structure. A
typical file control block is shown in the image below.
The in-memory data structures are used for file system management as well as
performance improvement via caching. This information is loaded on the mount time and
discarded on ejection.
In-memory mount table contains the list of all the devices which are being mounted
to the system. Whenever the connection is maintained to a device, its entry will be
done in the mount table.
This is the list of directory which is recently accessed by the CPU. The directories
present in the list can also be accessed in the near future so it will be better to store
them temporally in cache.
This is the list of all the open files in the system at a particular time. Whenever the
user open any file for reading or writing, the entry will be made in this open file table.
It is the list of open files subjected to every process. Since there is already a list
which is there for every open file in the system thereforeIt only contains Pointers to
the appropriate entry in the system wide table.
Directory Implementation
Disadvantage:
The major drawback of using the hash table is that generally, it has a fixed
size and its dependency on size. But this method is usually faster than linear
search through an entire directory using a linked list.
Advantages:
Simple to understand.
Finding the first free block is efficient. It requires scanning the words (a
group of 8 bits) in a bitmap for a non-zero word. (A 0-valued word has all
bits 0). The first free block is then found by scanning for the first 1 bit in
the non-zero word.
Disadvantages:
For finding a free block, Operating System needs to iterate all the blocks
which is time consuming.
The efficiency of this method reduces as the disk size increases.
2. Linked List
In this approach, the free disk blocks are linked together i.e. a free block
contains a pointer to the next free block. The block number of the very first
disk block is stored at a separate location on disk and is also cached in
Grouping
This approach stores the address of the free blocks in the first free block.
The first free block stores the address of some, say n free blocks. Out of
these n blocks, the first n-1 blocks are actually free and the last block
contains the address of next free n blocks. An advantage of this approach is
that the addresses of a group of free disk blocks can be found easily.
Advantage:
Finding free blocks in massive amount can be done easily using this
method.
Disadvantage:
The only disadvantage is, we need to alter the entire list, if any of the
block of the list is occupied.
Counting
This approach stores the address of the first free disk block and a number n
of free contiguous disk blocks that follow the first block. Every entry in the list
would contain:
Address of first free disk block.
A number n.
Advantages:
Using this method, a group of entire free blocks can take place easily and
Fastly.
The list formed in this method is especially smaller in size.
Disadvantage:
The first free block in this method, keeps account of other free blocks.
Thus, due to that one block the space requirement is more.
Recovery
Consistency Checking
The storing of certain data structures ( e.g. directories and inodes ) in memory and the caching of
disk operations can speed up performance, but what happens in the result of a system crash? All volatile
memory structures are lost, and the information stored on the hard drive may be left in an inconsistent
state.
A Consistency Checker ( fsck in UNIX, chkdsk or scandisk in Windows ) is often run at boot
time or mount time, particularly if a filesystem was not closed down properly. Some of the problems that
these tools look for include:
Disk blocks allocated to files and also listed on the free list.
Disk blocks neither allocated to files nor on the free list.
Disk blocks allocated to more than one file.
The number of disk blocks allocated to a file inconsistent with the file's stated size.
Properly allocated files / inodes which do not appear in any directory entry.
Link counts for an inode not matching the number of references to that inode in the
directory structure.
Two or more identical file names in the same directory.
Illegally linked directories, e.g. cyclical relationships where those are not allowed, or
files/directories that are not accessible from the root of the directory tree.
Consistency checkers will often collect questionable disk blocks into new files with names
such as chk00001.dat. These files may contain valuable information that would otherwise
be lost, but in most cases they can be safely deleted, ( returning those disk blocks to the
free list. )
UNIX caches directory information for reads, but any changes that affect space allocation or
metadata changes are written synchronously, before any of the corresponding data blocks are written to.