Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
14 views60 pages

Os - Unit 5

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1/ 60

Unit -5

File Systems
[Chapter 4 from Tanenbaum 3rd edition]

Prachi Shah
IT Dept.
BVM
Contents
1. Files: Naming, Structure, Types, Access, Attributes, Operations

2. Directories: Single-level, Hierarchical Directory Systems, Path names,


Operations

3. File System Implementation: File system layout, Implementing Files,


Implementing Directories, and Shared Files

4. File System Management: File System Backups, File System Consistency

5. Example of File Systems: MS-DOS, UNIX.


File Systems
• Requirements for long-term information storage:
1. It must be possible to store a very large amount of information.
2. The information must survive the termination of the process using it.
3. Multiple processes must be able to access the information concurrently.

• Disks are used to store files


 Information is stored in blocks on the disks
 Can read and write blocks

• Use file system as an abstraction to deal with accessing the


information kept in blocks on a disk
File Systems
• Files are logical units of information created by a processes
• Thousands of them on a disk managed by the OS
• OS structures them, names them, access them, use them, protect
them, implement them and manage them
• Part of the operating system dealing with files is known as the File
System.
• Two ways of looking at file system
 User-how do we name a file, protect it, organize the files
 Implementation-how are they organized on a disk
• Start with user, then go to implementer
• The user point of view
 Naming
 Structure
 Directories
Naming
 One to 8 letters in all current OS’s
 Unix, MS-DOS (Fat16) file systems discussed
 In Unix, file naming is case-sensitive, whereas not in MS-DOS.
 FAT (16 and 32) were used in first Windows systems
 Latest Window systems use Native File System
Naming
 Many operating systems support two-part file names, with the two
parts separated by a period, as in prog.c.

 The part following the period is called the file extension and
usually indicates something about the file.

 In MS-DOS, for example, file names are 1 to 8 characters, plus an


optional extension of 1 to 3 characters.

 In UNIX, the size of the extension, if any, is up to the user, and a file
may even have two or more extensions, as in homepage.html.zip
Typical file extensions
File Structure

 Byte sequences
 Maximum flexibility-can put anything in
 Unix and Windows use this approach
 Fixed length records
 Tree of records- uses key field to find records in the tree
File Structure

Three kinds of files.


(a) Byte sequence. (b) Record sequence. (c) Tree.
File Types

1. Regular – contains user information

2. Directories – system files for maintaining structure of the


file system

3. Character special files – model serial I/O devices (e.g.


printers)

4. Block special files – model disks


1. Regular Files
a) ASCII
• Printable
• Can use pipes to connect programs if they produce / consume
ASCII
b) Binary File Types
• Two Unix examples
 Executable (magic field identifies file as being executable)
 Archive – compiled, not linked library procedures
• Every OS must recognize its own executable
Binary File Types (Early Unix)

(a) An executable file


(magic number
identifiies it as an
executable file)

(b)An archive-library
procedures compiled but
not linked
File Access

1. Sequential access – read from the beginning, can’t skip around


• Corresponds to magnetic tape

2. Random access – start where you want to start


• Came into play with disks
• Necessary for many applications, e.g. airline reservation
system
• Two methods are used for specifying where to start reading:
 read (gives position in file)
 seek (set the current position)
File Attributes (hypothetical OS)
Operations or System Calls for files
Name Purpose

1 Create with no data, sets some attributes

2 Delete to free disk space

3 Open after create, gets attributes and disk addresses into main memory

4 Close frees table space used by attributes and addresses


usually from current pointer position. Need to specify buffer into which data
5 Read
is placed
6 Write usually to current position (file size increases or data are overwritten)

7 Append at the end of the file

8 Seek puts file pointer at specific place in file. Read or write from that position on
e.g. make needs most recent modification times to arrange for group
9 Get Attributes
compilation
10 Set Attributes e.g. protection attributes

11 Rename change the name of an existing file


Directories

• Files which are used to organize a collection of files

• Also called folders in some OS’s

.
Single Level Directory Systems
A single-level directory system containing four files.

Used in devices like telephones, digital cameras and some


portable music players
Hierarchical Directory Systems
Path names
Windows \usr\ast\mailbox
UNIX /usr/ast/mailbox
MULTICS >USr>ast>mailbox

• Absolute: cp /usr/ast/mailbox /usr/ast/mailbox.bak


• to access specific file; no consideration of current directory
• Relative: cp mailbox mailbox.bak
• to do the working in current directory

• . Refers to current (working) directory


• .. Refers to parent of current directory
E.g. copy the file /usr/lib/dictionary to its own directory using the command
cp ../lib/dictionary
Path Names

A UNIX directory tree.


Unix cp commands involving dots
Cp ../lib/dictionary .

• .. says go to parent (usr)


• . says that target of the copy is current directory
• cp /usr/lib/dictionary dictionary works
• cp /usr/lib/dictionary /usr/ast/dictionary also works
Directory Operations
Name Purpose

1 Create creates directory

2 Delete directory has to be empty to delete it

3 Opendir Must be done before any operations on directory

4 Closedir Directory is closed to free up internal table space

5 Readdir returns next entry in open directory

6 Rename renames the directory

7 Link links file to another directory [Types: Hard link and Symbolic link]

8 Unlink Gets rid of directory entry


File Implementation
• Users are concerned with
 how files are named
 what operations are allowed on them
 what the directory tree looks like, and
 similar interface issues

• Implementors are interested in


 how files and directories are stored
 how disk space is managed, and
 how to make everything work efficiently and reliably
File System Layout
• Files stored on disks. Disks broken up into one or more partitions, with
separate file system on each partition
• Sector 0 of disk is the Master Boot Record (MBR); which is used to
boot the computer
• End of MBR has partition table. Has starting and ending addresses of
each partition.
• One of the partitions is marked active in the master boot table
• Boot computer => BIOS reads & executes MBR
• MBR finds active partition and reads in first block (boot block)
• Program in boot block locates the OS for that partition and reads it in
• All partitions start with a boot block
A Possible File System Layout

• Superblock – contains magic number to identify info about the file


system (e.g. type of file system, number of blocks, …)
• i-nodes – an array of data structures, one per file, telling all about the
file
Allocating Blocks to files / Implementing files
• Most important implementation issue – keeping track of which disk
blocks go with which file.

• Four Methods:
1. Contiguous allocation
2. Linked list allocation
3. Linked list using table
4. I-nodes
1. Contiguous Allocation

(a) Contiguous allocation of disk space for 7 files.


(b) The state of the disk after files D and F have been removed.
1. Contiguous Allocation
The good
• Easy to implement – just remember two numbers: the disk address
of the first block and the number of blocks in the file
• Read performance is great. Only need one seek to locate the first
block in the file. The rest is easy.

The bad-disk becomes fragmented over time

• CD-ROM’s use contiguous allocation because the file system size is


known in advance
• DVD’s store a few consecutive 1 GB files because standard for
DVD only allows a 1 GB file max
2. Linked List Allocation

Storing a file as a linked list of disk blocks

The good – Gets rid of fragmentation


The bad – Random access is slow. Need to chase pointers to get to a block
3. Linked lists using a table in memory

• Put pointers in table in memory


• That table is called File Allocation Table (FAT)

• The bad – table becomes really big


• E.g 200 GB disk with 1 KB blocks needs a 600 MB table
• Growth of the table size is linear with the growth of the disk size
3. Linked List Allocation Using a Table in Memory – The
Solution

Storing a file as a linked list of disk blocks Linked list allocation using a file
allocation table in main memory
4. I-nodes (Index-node)

• Keep data structure in memory only for active files


• Data structure lists disk addresses of the blocks and
attributes of the files
• K active files, N blocks per file => k*n blocks (maximum)
• Solves the growth problem

• How big is N?
• Solution: Last entry in table points to disk block which
contains pointers to other disk blocks
4. I-nodes (Index-node)

Figure 4-13. An example i-node.


Implementing Directories

 Open file, path name used to locate directory


 Directory specifies block addresses by providing
 Address of first block (contiguous)
 Number of first block (linked)
 Number of i-node
Implementing Directories

(a) fixed-size entries with the disk addresses and attributes (DOS)
(b) each entry refers to an i-node. Directory entry contains attributes. (Unix)
Implementing Directories

 How do we deal with variable length names?


 Problem is that names have gotten very long
 Two approaches
1. Fixed header followed by variable length names
2. Heap-pointer points to names
Implementing Directories

Two ways of handling long file names in a directory.


(a) In-line. (b) In a heap.
Shared Files

File system containing a shared file.


File systems is a directed acyclic tree (DAG)
Shared files

 If B or C adds new blocks, how does other owner find out?


 Use special i-node for shared files-indicates that file is
shared
 Use symbolic link - a special file put in B’s directory if C is
the owner. Contains the path name of the file to which it is
linked
I-node problem

 If C removes file, B’s directory still points to i-node for


shared file
 If i-node is re-used for another file, B’s entry points to
wrong i-node
 Solution is to leave i-node and reduce number of owners
I node problem and solution

(a) Situation prior to linking. (b) After the link is created. (c) After
the original owner removes the file.
Symbolic links

 Symbolic link solves problem


 Can have too many symbolic links and they take time to
follow
 Big advantage-can point to files on other machines
File System Management and Optimization-
the dirty work
 File System Backups
 File System Consistency
File System Backups (1)
• Backups to tape are generally made to handle one of two
potential problems:
 Recover from disaster (e.g. disk crash, pit bull eats
computer)
 Recover from stupidity (e.g. someone removes file by
stepping on keyboard)

• Moral:
(1) don’t allow anybody near computers (2) back up your files

• Tapes hold hundreds of gigabytes and are very cheap


Back up issues
1. Not to back up whole file system
 Can get binaries from manufacturer’s CD’s
 Temporary files don’t need to be backed up
 Special files (I/O) don’t need it
2. Incremental dumps: Complete dump weekly/monthly and daily dump of
modified files since big dump
 minimizes dumping time
 It makes recovery more complicated; because first the most
recent full dump has to be restored, and then, all the incremental
dumps in reverse order.
 need better algorithms
Back up issues
3. to compress data before dumping it
 but what if part of the tape is bad
4. hard to dump when system is active (i.e. being used)
 leads to inconsistency
 Algorithms have been devised for making rapid snapshots of the
file system state
5. many nontechnical problems can happen into an organization
 Suppose the best online security system in the world may be
useless if the system administrator keeps all the backup tapes in
his office and leaves it open and unguarded
Dumping strategies

1. Physical dump

2. Logical dump
1. Physical dump
 dump the whole thing
 The good – it is simple to implement & has great speed
 Don’t want to dump:
a) unused blocks – program needs access to the unused block list
and must write block number for used blocks on tape
b) bad blocks – Disk controller must detect and replace them or the
program must know where they are (they are kept in a bad block
area by the OS)
 The bad –
o Can’t skip a particular directory
o Can’t make incremental dumps
o Can’t restore individual files
2. Logical Dumps
o Starts at directory and recursively dumps all files/directories below it
which have changed since a given time

o Examine standard algorithm used by Unix. Dumps files/directories on


path to modified file/directory because
 Can restore path on different computer
 Have the ability to restore a single file
[Example of file /usr/jhs/proj/nr3/plans/summary in a directory whose
parent directories are removed]
Logical Dump Algorithm
o Uses bitmap indexed by i-node
o 4 phases
1) Phase 1 – starts at root and marks bits for modified
files and all directories (a)
2) Phase 2 – walks the tree, unmarks directories without
modified files in/under them (b)
3) Phase 3 – go through i-nodes and dump marked
directories (c)
4) Phase 4 – dump files (d)
File System Backups - Example
Restoring file on disk
o Start with empty file system on disk
o Restore most recent full dump. Directories first, then files.
o Then do restore incremental dumps (they are in order on the tape)
o Issues:
Free block list must be reconstructed. Easy to do-it is the
complement of the used blocks
Links to files have to be carefully restored
Files can have holes in them. Don’t want to restore files with lots of
zeroes in them
Pipes can’t be dumped
Tape density is not increasing => need lots of tapes and robots to get
at them.
File System Consistency
o Crash before blocks written out results in an inconsistent state
o It is critical if those blocks are i-node blocks, directory blocks, or
blocks containing the free list
o Need utility program to check for consistency in blocks and files (e.g.,
fsck in Unix, scandisk in Windows)
o Two types of consistency check: by blocks & by files
o For block consistency, it Uses two tables
1. How many times is block present in a file
2. How many times block is present in free list
 Device then reads all of the i-nodes, increments counters
File System Consistency – Example

Figure 4-27. File system states. (a) Consistent. (b) Missing block.
(c) Duplicate block in free list. (d) Duplicate data block.
File System Consistency – Example
o Missing block (b)-put on free list
o Block occurs twice on free list (c)-re-build free list
o Block occurs twice on blocks in use (d)- notify user. (If one
file is removed, then block appears on both lists)

o Solution to duplication:
Look at files instead of blocks
Use table of counters, one per file
The MS-DOS File System-directory entry

.
The MS-DOS File System –maximum partition size

Figure 4-32. Maximum partition size for different block sizes. The
empty boxes represent forbidden combinations.
The UNIX V7 File System (1)

Figure 4-33. A UNIX V7 directory entry.


The UNIX V7 File System (2)

Figure 4-34. A UNIX i-node.


The UNIX V7 File System (3)

Figure 4-35. The steps in looking up /usr/ast/mbox.

You might also like