Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
37 views

Chapter 4 File System

The document discusses file systems and files. It covers topics like what files are, how they are structured and named, the different types of files, how files are accessed, and file attributes. Key points include: - Files store information on disks and are accessed by processes using unique names. They can be structured as byte sequences, record sequences, or trees. - Common file types include regular files, directories, character special files, and block special files. File extensions indicate file type and internal structure. - Files can be accessed sequentially or randomly via operations like read, write, seek, open, and close. - File attributes provide metadata about files like size, permissions, timestamps, and flags to
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Chapter 4 File System

The document discusses file systems and files. It covers topics like what files are, how they are structured and named, the different types of files, how files are accessed, and file attributes. Key points include: - Files store information on disks and are accessed by processes using unique names. They can be structured as byte sequences, record sequences, or trees. - Common file types include regular files, directories, character special files, and block special files. File extensions indicate file type and internal structure. - Files can be accessed sequentially or randomly via operations like read, write, seek, open, and close. - File attributes provide metadata about files like size, permissions, timestamps, and flags to
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 21

Chapter 4: FILE SYSTEMS

Files are logical units of information created by processes


A disk will usually contain 1000’s of files even more sometimes.
Processes can create one file or read existing files
Information stored in file must be persistent, that is, not affected by processes creation and
termination.
A file should only disappear if the owner of the file deletes it
Files are managed by OS
It includes

1. Structuring
2. Naming
3. Accessing
4. Protecting
5. Implementing etc…

One of the major concerns is system design


Part of the OS dealing with files known as file system
Files at user point of view

1. File naming

Files are an abstract mechanism, they provide a way to store information on the disk and read
whenever required.
The most important characteristics of any abstraction mechanism are the array the objects
being managed are named.
When a process creates a file, it gives a filename
When a process terminates a file, the file continuous to exit and can be accessed by other
processes using its name.
The rules of name file vary from system to system, but all the OS allow strings of 1 to 8 letters
As a legal file names

1
Digits and special characters are also permitted
Example file name: 2urgent!
Many file system supports names as long as 255 characters
UNIX follows /accepts distinct file names like maria, Maria and MARIA
In MS-DOS all these file names refer to the same file
Windows 95 & 98 uses MS-DOS’s file system FAT-16.
Windows 98 introduce some extensions of FAT-16 that leads to FAT-32.
NT, 2000 & XP supports FAT these 4 NT based OS have a NTFS -Native File System
Many OS support two part file names, two parts separated by a period as in prog.c
The part followed by the period is called extension.
In MS-DOS 1-8 character and 1-3 optional
In UNIX a file may have two or more extensions
Example homepage.html.zip
Webpage zip’s homepage.html, has been compressed using zip program

Extension Meaning

file.bak Backup file


file.c C source program
file.gif Compuserve Graphical Interchange Format image
file.hlp Help file
file.html World Wide Web HyperText Markup Language document
file.jpg Still picture encoded with the JPEG standard
file.mp3 Music encoded in MPEG layer 3 audio format
file.mpg Movie encoded with the MPEG standard
file.o Object file (compiler output, not yet linked)
file.pdf Portable Document Format file
file.ps PostScript file
file.tex Input for the TEX formatting program
file.txt General text file
file.zip Compressed archive

2
2. File structure

Figure 1 Three kinds of files. (a) Byte sequence. (b) Record sequence. (c) Tree

File can be structured in 3 ways


Three common possibilities
Byte sequence
Record sequence
Tree

Byte sequence:

Provide maximum flexibility

User program can put anything they want in their files and name them any way that is
convenient

All versions of UNIX, MS-DOS and Windows use this file model

Record sequence

A file is a sequence of fixed length records, each with some internal structure

Read operations returns one record write operation overwrites or appends one record

TREE

A file consist of tree of records, not necessarily length, each containing a key field in a fixed
position in the record

New record can be added to the file, with the OS and not user, deciding where to place them.

3
3. File types

UNIX and Windows have regular files and directories

UNIX has character and Block special files

Regular files contains user information

Directories are system files for maintaining the structure of the file system

Character special files are related to input and output and used to model serial I/O devices such
as terminals, printers and network

Block special file are used to model disks

Regular files are generally ASCII or binary files

ASCII consist of lines of text

The Advantage of ASCII files is that they can be displayed and printed as it is and they can be
edited with any text editor

If large number of programs use ASCII files for I/P & O/P, it is easy to connect the O/P of one
program to the I/P of another program.

Binary listing them on the printer gives an incomprehensible listing full of random junk

Binary have some internal structure known to program that use them

Figure 2 (a) An executable file. (b) An archive

4
A file is just a sequence of bytes; the OS will only execute a file if it has the proper format.

It has 5 sections

1. Header
2. Text
3. Data
4. Relocation bits
5. Symbol table

The header starts with so called magic number, identifying the file as an executable file

Then the size of the file, the address at which the execution starts and some flag bits

Followed by the text and data of the program itself

They are loaded into the memory and reloaded by relocating bits whenever needed

The symbol table is for debugging

Example of a binary file is an archive

It consists of collection of library procedure (modules) compiled but not linked

Each one is prefaced by a header telling its name, creation date owner, protection code and
size.

Every OS must recognize at least one file type

A system in which program O/P files have .dat (datafile)

If the user write a C program then transforms it, then write the transformed file as output, the
O/P file will be of type “.dat”

If the user tries to offer this to C compiler to compile it, the system will refuse it because of
wrong extension

Attempts to copy file.dat to file.c will be rejected by the system as invalid.

4. File access

Early OS provide only one kind of file access that is

Sequential access

In these systems a process could read all the bytes or records in a file, in order

5
Sequential files could be rewind, however, so they could be read as often as needed

Sequential files were convenient when the storage medium was magnetic tape then disk

Random access files

Files whose bytes or records can be read in any order they required many applications

Example: Data base systems

Every read operation gives the position in the file to start reading at.

Then seek is provided to set the current position.

After a seek, the file can be read sequentially, from now current position

5. File attribute

Every file has a name and its data, other information associated with file are date and time the
file was last modified and the file size

All these extra items are file attributes; they call it as metadata (data about data)

Attributes vary from system to system

Attribute Meaning
Protection Who can access the file and in what way
Password Password needed to access the file
Creator ID of the person who created the file
Owner Current owner
Read-only flag 0 for read/write; 1 for read only
Hidden flag 0 for normal; 1 for do not display in listings
System flag 0 for normal files; 1 for system file
Archive flag 0 for has been backed up; 1 for needs to be backed up
ASCII/binary flag 0 for ASCII file; 1 for binary file
Random access flag 0 for sequential access only; 1 for random access
Temporary flag 0 for normal; 1 for delete file on process exit
Lock flags 0 for unlocked; nonzero for locked
Record length Number of bytes in a record
Key position Offset of the key within each record
Key length Number of bytes in the key field
Creation time Date and time the file was created
Time of last access Date and time the file was last accessed
Time of last change Date and time the file has last changed
Current size Number of bytes in the file
Maximum size Number of bytes the file may grow to
Figure 3 Some possible file attributes

6
a. Create: this is a call to announce that a file has been created and coming for use
& some attributes to be set for the created file
b. Open: before a file is to be used a process should open it, the purpose of open
call is to allow the system to fetch some attribute and list of disk address into
main memory for rapid access on later calls.
c. Close: when all the access are finished, the attributes are no longer needed.
Close is to free up internal table spaces.
d. Read: data are from file, usually the bytes read from current position
e. Write: data are written to file again usually at the current position, if the position
is end of file then file size increases
f. Append: add data to the end of file
g. Seek: for random access, it repositions the file pointer to a specific place in the
file.
h. Get attributes: process need to read attributes to do their work. MAKE program
is commonly used to manage s/w development projects contains many source
files & object files & arrange minimum no: of compilations. Required to bring
everything up to date
i. Set attributes: some attributes are user settable, can be set after user creates a
file
j. Delete: to delete a unnecessary files to free up disk space, there is a system call
for this purpose
k. Rename: renaming a file

Directories

To keep track of files the system normally has directories or folders

1. Single level directory system

The simplest form of directory system is having one directory containing all the files, sometimes
called as root directory.

Advantage simplicity and the ability to locate files quickly it is used in simple embedded devices
such as telephones digital camera and some portable music players

Figure 4 A single level directory system containing 4 files

7
2. Hierarchical directory system

With this approach there can be as many directories as are needed to group the files in natural
way.

If multiple users share a common file server as is the case on many company networks, each
user can have a private root directory for each user’s own hierarchy

Figure 5 A hierarchy directory system

3. Path names

When the file system is organized as a directory tree some way is needed for specifying file
names

Absolute path name -> consist of path from root directory to the file
Example: /usr/ast/mailbox
o Root directory contains subdirectory which contains the file name mailbox
o Absolute path name always start at the root directory and are unique
o In UNIX the components of the path are separated by “/”
 Example: \usr\ast\mailbox
o In Windows components of the path are separated by “\”
 Example: /usr/ast/mailbox
o In MULTICS components of the path are separated by “>”
 Example: >usr>ast>mailbox
4. Directory operations
a. Create: A directory is created dot & dotdot are put automatically by the system.
Otherwise MKDIR
b. Delete: only an empty directory can be deleted
c. Opendir: directories can be read, to list all files in directory.

8
d. Closedir: when a directory has been read, it should be closed to freeup spaces of
internal table.
e. Readdir: this call returns the next entry in an open directory, it was possible to
read directories using read system call
f. Link: allow a file to appear in more than I directory
g. Unlink: directory entry is removed, if the file is unlinked it is only available in on
directory.
h. Rename: renaming a directory

File System Implementation

Part of the OS dealing with files known as file system

File Allocation Table (FAT)

It provides mapping between the clusters (a group of disk sectors) the basic unit of logical
storage of data at the OS level and the physical location of cylinder, tracks & sectors

FAT contains entry for every cluster in the volume and volume contains the starting address of
the cluster

Each cluster contains pointer to the next cluster in the file or an end of file indicator 0xFFFF

The addressing is done by the driver’s hardware controller

FAT 12 Partition size of cluster is 8MB

FAT 16 Partition size of cluster is 2GB (1984)

Fixed number of cluster per partition, the bigger the HDD is the bigger the cluster size is!!!

The OS assigns a unique number to each cluster to keep track of files

The OS marks the cluster as being used, even if it is not in use, is called as lost cluster

With Scan-Disk utility we can identify the cluster as lost cluster

1. File System Layout

File system are stored on disk

Most disks can be divided up into one or more partition

9
Each divided partition will have individual file system.

Sector 0 will have MASTER BOOT RECORD (MBR)

At the end of the MBR a partition table is available

Out of which one of the partition is the active partition

When the computer is booted the BIOS reads the MBR and then execute it

MBR locate the active partition, it is read in the first block called the boot block and then
executes it

The boot block contains the program that load the OS contained in that partition

Super block: contains the important parameters of file systems and is read into memory, when
the computer is booted or the file system is accessed

Super block contains some magic number to identify the file system type and number of blocks
in the file system.

Next information about free blocks

Next i-nodes array of DS telling about the file

Root directory

Disk contains directories and files

Figure 6: A possible file system layout

2. Implementing files

Contiguous Allocation

The simplest allocation scheme is to store each file as a contiguous disk blocks

10
On a disk with 1-KB blocks, a 50 KB file would be allocated with 50 consecutive blocks.

On a disk with 2-KB blocks, a 50 KB file would be allocated with 25 consecutive blocks.

Figure 7 (a) Contiguous allocation of disk space for seven files. (b) The state of the disk after files D and F have been removed

In the above figure 40 disk blocks are available, which starts from block 0 on the left.

Initially all the blocks will be empty, then files are loaded with variable number of blocks depending
upon the size of files.

File A of length with 4 blocks was written to block at the starting of beginning block 0, then file B with 3
blocks, will be written at the end of the file A’s last block.

Each file will be starting at the start of the new block, in case if a file occupies only half of the block, the
new file will be stored in a new block.

Two advantages

1st it is simple to implement because keeping track of where a file’s block are available, is reduced by
remembering two numbers: disk address of the first block & number of blocks in the file

2nd the read performance is excellent, the entire file can be read from the disk in a single operation.

Drawback

When a new file is to be created it is necessary to know file size before allocating space for it

Example: a text editor to type a document, the first thing the program asks how many bytes the final
document will be. The question must be answered or the program will not be continued.

When the size given is too small, then the program will terminate prematurely because the hole is full
where to put the rest of file

11
To avoid this problem if the user tries to allocate large file size, they may not be able to file such a big
hole and will be informed not such can be created

Over some amount of time the disk become fragmented

When some files are removed, the blocks become free.

The disk is not compacted, compaction takes lot of time, it is a bit pain full job, because of coping all the
usable blocks and accumulating all the holes is big job.

3. Linked list allocation

The second method of storing files is to keep each one as a linked list of disk blocks,

Figure 8 storing a file as a linked list of disk blocks.

The first word of each block is used as a pointer to next one. The reset of the block is for data.

Only the address of first block should be stored in the directory.

Reading a file sequentially is straight forward, random access is slow, to get block n the operating system
has to start at the beginning and then read till n-1 blocks

The amount of storage in a block is not big because the pointer takes up a few bytes.

Having less size is not good enough because many programs read and write in blocks.

When trying to read some information that may be available in two different blocks will lead to
concatenating two blocks will need few spaces to copy, again it is a overhead.

4. Linked list allocation using a table in the memory

Taking the pointer word from the each disk block and putting in a table in memory

12
Figure 9 Linked list allocation using a file allocation table in main memory.

Two files uses disk blocks 4,7,2,10 &12 in that order and file B uses disk block 6,3,11 and 14 in
that order. Using the of storing a file in linked list start the block 4 and follow the chain all the
way to the end. The same can be done starting with 6, both the chains are terminated with the
marker (-1) such table in the main memory is called FAT

The chain is available in the memory, it can be followed without making any reference in the
disk.

The directory can have single entry to locate all blocks in the file (even a very large size of file)

Disadvantage

Entire table must in memory all the time

With 200GB with 1 KB block size, the table needs 200 million entries, one entry per 200 million
disk blocks.

Each entry has a minimum of 3 bytes

200 * 3 = 600MB of main memory all the time, is not a good option.

5. I-nodes

13
i-nodes keep track of which blocks belongs to which file, is to associate with each file a data
structure called an i-node (index node)

Which list the attribute and disk addresses of file’s blocks.

Figure 10 An example i-node

Given the i-node it is possible to find all the blocks of the file

The advantage

i-node only needs to be in the memory when the corresponding file is open.

i-node scheme requires an array in the memory, the size is proportional to the maximum
number of files that may be open at once.

Draw back

If each one has room for a fixed number of disk addresses, what happens when a file grows
beyond this limit?

So one solution as seen in the diagram is to reserve the last disk address for a block
containing more disk block address

Implementing directories

14
The main function of directory is to map ASCII name of file onto the information needed to
locate the data

The issue is where the attributes should be stored

Every file system maintains attributes, such as file’s owner name and creation time and where
they are stored.

One possibility is to directly store them in directory entry.

Simple design a directory consist of set of fixed size entries one per file containing fixed size for
names, attributes and disk addresses.

Figure 11 (a) A simple directory containing fixed-size entries with the disk addresses and attributes in the directory entry. (b)
A directory in which each entry just refers to an i-node.

For the system that uses i-nodes, another possibility for storing the attributes is in the i-nodes, rather
than in the directory entries. In that case, the directory entry can be shorter, just a file name and i-node
number.

The one alternative approach is to avoid directory entries with same size

Each directory entry contains fixed portion, starting with the length of the entry, followed by data with
fixed format includes owner, creation time, protection information and other attributes.

Header length followed by file name

15
Figure 12 Two ways of handling long file names in a directory. (a) In-line. (b) In a heap.

In the above figure there 3 files project-budget, personnal & foo, each file name is terminated by a
special character (usually 0 – which is represented by special symbol a box inside a cross)

Disadvantage

When the file is removed, a variable size gap is available , in the directory in which a new file name can
be inserted, but incase if the size may not be adequate

So, to handle variable length names in a heap as shown in figure (b)

Advantage: Hash table is used to search the file names to speed up the search

Disadvantage: is complex administration

16
Shared files

Figure 13 File system combining a shared file.

One of C’s file is present on B’s directory

Connection between B’s directory and shared file is called LINK

The file system is now a Directed Acyclic Graph (DAG) not a tree

Sharing files are convenient but with some problem

If directories contain disk address, then a copy of disk address will have to be made in B’s
directory after the file is linked.

Now if C or B is keep on adding information in the file the new blocks will be listed only in the
directory of the user doing the appending

The changes will not be visible to the other user, thus sharing is not achieved.

Two ways of solution

i-node number (a data structure of the file), the directory can just point to the little data
structure/i-node number

The new file in B from C just contains the path name of the file to which it is linked

When B reads the linked file, the OS sees that the file being read from is of type LINK, lookup
the name of the file, & reads that file, this approach is called symbolic link

17
Drawback of i-node number:

The owner of the file removing the file

Re-assining the inode number to another file

It is difficult to create pointer to all the directories to search the reassigned i-node number,
because of 1000’s of directories in a system

One possible way remove C’s directory entry, though removing directory entry, the i-node
number will not be removed

Now the problem is B will have directory entry for the file owned by C

If more blocks are increase by many read and write operations, directory C will have to face
crises because of it’s i-node number used by B. this problem will continue until B delete the file

In symbolic link this problem does not occur, because only true owner will have the pointer to
i-node number

User who have linked the file just have path name not i-node pointer’s

When owner remove the file, it is destroyed, the other user directory trying to access file via
symbolic link will not fetch the file

Drawback of symbolic link

Additional overhead

To read path

Parse the path followed component by component until the i-node is reached, these activities
needs an additional disk accesses

Advantage

Through the symbolic link any files can be linked to any machine anywhere in the world with
the help of networks

Virtual file system

UNIX: Attempt to integrate multiple file system into a single structure

A LINUX system could have

18
1. Ext2 as root file system
2. Ext3 partition mounted on /usr
3. Second harddisk with a ReiseFS file system mounted on /home
4. As well as ISO 9660 CD-ROM mounted on /mnt

For user point of view there will be a single file system hierarchy

UNIX file system used the concept of virtual file system (VFS) to integrate multiple file system
into an orderly structure

Figure 14 Position of virtual file system

All system calls relating to files are directed to the virtual file system for initial processing

These calls coming from user processes are the standard POSIX calls, such as OPEN, READ,
WRITE, LSEEK & so on.

VFS has the upper interface that is the POSIX (user process)

VFS has the lower interface which is labeled as VFA interface

This interface has several dozen’s of functions calls that VFS can make function call to each file
system to get the work done.

Example

A function is the one that reads a specific block from disk puts it in the file system’s buffer cache
and returns a pointer to it.

Several key objectives supported by VFS

1. Super block – describes file system


2. V-node – describes the file
3. Directory – file system directory

19
4. VFS internal data structure
a. Mount table
b. Array of file descriptor to keep track of all open files in the user processes.

Understanding of how is VFS works

1. When the system is booted, the root file system is registered with VFS
2. Other File System mounted either at boot time or during operation, file system must
register with VFS
3. During registration, it provides list of address of the function the VFS requires, either as
vector table or several of them 1 per VFS based on the demand
4. Once the file system is registered with VFS, the VFS knows to
a. How to read a block or
b. How to call function in the vector supplied by file system

Example

If a file system has been mounted on/usr and a process makes a system call

Open(“/usr/include/unistd.h”,O_RDONLY)

VFS sees that a new file system has been mounted on /usr and locates its super block by
searching the list of super block on mounted file system

By doing this it can find the root directory of the mounted file system and lookup the path
include/unistd.h there

VFS, creates v-node and makes a call to the concrete file system to return all the information in
the file i-node

This information is copied to v-node (in RAM) along with pointer to the table of functions to call
for operations on v-node such as read, write, close & so on…

After v-node is create an entry is made in file descriptor table for calling process and set it to
point to the new v-node

Finally VFS returns file descriptor to the caller so it can use it to read, write and close the file

Later when the process does a read using file descriptor, the VFS locates v-node from the
process and the file descriptor table and follows the pointer to the table of functions

All of which are addresses with the concrete file system on which requested file resides

20
Figure 15 A Simplified view of data structure & D code used by VFS & CFS to do a read

Starting with the caller processes number and file descriptor, successfully the v-node, read
function pointer & access function within the concrete file system are located

21

You might also like