Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

OS Unit-5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

UNIT - V

File System Interface and Operations

Definition of a File:
A file is a named collection of related information that is recorded on secondary storage.
(or)A file is the smallest allotment of logical secondary storage.
(or)A file is a sequence of bits, bytes, lines, or records, the meaning of which is defined
by the file’s creator and user. Many different types of information may be stored in a file.

File Attributes
File Attributes gives the Operating System information about the file and how it is
intended to use.
A file’s attributes vary from one operating system to another but typically consist of
these:
❖ Name: The symbolic file name is the only information kept in human readable
form.
❖ Identifier: This unique tag, usually a number, identifies the file within the file
system; it is the non-human-readable name for the file.
❖ Type: This information is needed for systems that support different types of
files.
❖ Location: This information is a pointer to a device and to the location of the file
on that device.
❖ Size: The current size of the file (in bytes, words, or blocks) and possibly the
maximum allowed size are included in this attribute.
❖ Protection: Access-control information determines who can do reading,
writing, executing, and so on.
❖ Time, date, and user identification: This information may be kept for
creation, last modification, and last use. These data can be useful for protection,
security, and usage monitoring.
File Operations
Since a file is an abstract data type, we should define the operations that can be
performed on files. The operating system provides system calls to perform these
operations
Create: the OS must find space in the file system and add an entry to the directory
Write: OS must find the location of a file and usually keeps a write pointer that
indicates where the next write will occur
Read: OS must find the file and usually keeps a read pointer that indicates where the
next read will occur
Reposition within a file: change the file position pointer to a given value (i.e. seek
a given location)
Delete: release the space allocated to a file and update the directory
Truncate: erase the contents of a file, but keep its attributes

❖ The OS maintains an open-file table with information about all open files.
❖ When a process finishes using a file, it calls a close() system call.
❖ The open-file table contains the following information for each file:
1. File pointer: stores the current read/write location within a file; unique to a
process accessing the file
2. File-open count: the number of processes accessing the file
3. Disk location of the file: to improve access speed, the location of the file
on disk is stored in memory
4. Access rights: indicates what operations a process is allowed to do to a
file

File Types

When we design a file system, we always consider whether the operating system
should recognize and support file types. If an operating system recognizes the type of a
file, it can then operate on the file in reasonable ways. A common technique for
implementing file types is to include the type as part of the file name. The name is split
into two parts—a name and an extension, usually separated by a period. Examples
include resume.docx, server.c, and ReaderThread.cpp.
Access methods
Files store information. When it is used, this information must be accessed and read
into computer memory. The information in the file can be accessed in the following ways,
1. Sequential Access
❖ The simplest access method is sequential access.
❖ Information in the file is processed in order, one record after the other and does
not skip any record in between.
❖ Problem: More time to access, poor performance.
❖ Example: Editors and Compilers usually access files in this fashion.
❖ Operations
▪ A read operation—read next ()—reads the next portion of the file and
automatically advances a file pointer, which tracks the I/O location. Similarly, the

▪ write operation—write next ()—appends to the end of the file and advances to
the end of the newly written material (the new end of file).
▪ read next
write next
reset
2. Direct Access or Relative Access or Random Access

❖ Another method is direct access (or relative access or random access).


❖ Directly access the record.
❖ Each record has its own address so using that it is being accessed directly
❖ Need not to be in sequence or adjacent location in the storage medium.
❖ We can access any file block.

❖ Used by editors to access contents randomly


❖ For direct access, the file is viewed as a numbered sequence of blocks or records.
Thus, we may read block 40, then read block 53, and then write block 7. There are
no restrictions on the order of reading or writing for a direct-access file.
❖ Examples:
➢ Direct-access files are of great use for immediate access to large amounts
of information. Databases are often of this type. When a query concerning
a particular subject arrives, we compute which block contains the answer
and then read that block directly to provide the desired information.
❖ Operations
➢ For the direct-access method, the file operations must be modified to
include the block number as a parameter. Thus, we have read (n), where
n is the block number, rather than read next (), and write (n) rather than
write next ().
➢ read n
write n
position to n
rewrite n
n = relative block number

3. Indexed Access
❖ Combination of sequential and direct access method.
❖ Fast and efficient way in large files
❖ It involves the construction of an index for the file. The index, like an index in the
back of a book, contains pointers to the various blocks. To find a record in the
file, we first search the index and then use the pointer to access the file directly
and to find the desired record.

❖ With large files, the index file itself may become too large to be kept in memory.
One solution is to create an index for the index file. The primary index file
contains pointers to secondary index files, which point to the actual data items.
❖ Example: IBM’s ISAM(indexed sequential-access method)

Simulation of Sequential Access on Direct access File


Directory Overview:
Each of these regions of the disc can have a file system created on it. Generally
speaking, a volume refers to any object that houses a file system. Each volume that
houses a file system must also have details about the system's files. Entries in a device
directory or volume table of contents include this information. All files on that volume's
device directory (or directory) are listed along with details like name, location, size, and
type.
A symbol table that translates file names into directory entries can be thought of as the
directory. The procedures that must be carried out on a directory are as follows:
1. Search for a file.
2. Create a file
3. Delete a file.
4. List a directory.
5. Rename a file
6. Traverse the file system

Directory Structure
The most common schemes for defining the logical structure of a directory are
the following,
1. Single-Level Directory
● The simplest directory structure is the single-level directory. All files are
contained in the same directory, which is easy to support and understand.
Limitations
I. All files are in the same directory, they must have unique names. If two users call
their data file test.txt, then the unique-name rule is violated.
II. Even a single user on a single-level directory may find it difficult to remember the
names of all the files as the number of files increases. Keeping track of so many
files is a problem.
2. Two-Level Directory:
Standard Solution : To avoid confusion of file names among different users, a common
approach is to establish separate directories for each user.
Structure: The two-level directory system consists of three levels:
1. Master File Directory (MFD): Located at the top level, it serves as a primary
directory.
2. User File Directory (UFD): Positioned at the second level, each user has their
dedicated UFD.
3. Actual Files: These reside at the third level within the respective UFDs.

User Access: When a user initiates a job or logs in, the system searches the MFD,
indexed by user names or account numbers. Each entry in the MFD points to the
corresponding user's UFD.
File Operations: File operations are limited to the user's UFD. Creating a file entails
searching only the user's UFD to check for name conflicts. Similarly, file deletion is
confined to the local UFD, minimizing the risk of accidentally deleting files with the same
name owned by other users.

Advantages and Disadvantages: While this structure resolves name collision issues, it
isolates users from each other. While isolation is useful for independent users, it hinders
cooperation and file sharing. Some systems do not allow local user files to be accessed
by others.

Search Path: The sequence of directories searched when a file is named is referred to
as the "search path." This ensures that the system efficiently locates the required files,
whether they belong to the user or are part of the system files.

3. Tree-Structured Directory:
Tree Directory Structure: The tree directory structure is the most common type,
consisting of a root directory from which all files and directories branch out. Each file in
the system has a unique path name.

Directory Format: Directories, including subdirectories, are treated as special files.


They all share the same internal format. A single bit in each directory entry specifies
whether it's a file (0) or a subdirectory (1). Special system calls are used for directory
creation and deletion.

Current Directory: Each process has a current directory that holds files of immediate
interest. When referencing a file, the system searches the current directory first. If the
required file isn't there, the user typically needs to specify a path name or use the
"change directory" system call to switch to the directory containing the file.

Path Names: Path names describe the route the operating system must follow to reach
a specific location. There are two types of path names:
1. Absolute Path Name: It starts from the root directory and spells out the entire
path, including all directory names.
2. Relative Path Name: Relative to the current directory, it defines a path to the
target location.

Deletion of a Directory: Deleting a directory depends on its contents.


A. Empty Directory: If a directory is empty, the parent directory can simply remove
its entry.
B. Non-Empty Directory: If a directory contains files or subdirectories, two
approaches can be taken:
● Empty First: Some systems only allow the deletion of empty directories. Users
must manually delete all files and subdirectories within before removing the
directory itself. This process may involve recursive deletions for subdirectories.
● Forceful Deletion: Certain systems, like UNIX with the "rm" command, offer an
option to forcefully delete a directory along with all its files and subdirectories in a
single operation.

Acyclic-Graph Directory:
● Introduction: In contrast to the rigid tree structure, acyclic-graph directories
allow for greater flexibility in organizing files and directories. They enable
directories to share subdirectories and files, facilitating more complex
relationships.
● Shared Entries: In an acyclic graph structure, the same file or subdirectory can
exist in multiple directories. This sharing capability increases complexity but
offers advantages in terms of organization and resource management.

● Implementation:

a. Linking: A common method is to create directory entries called links. These links act
as pointers to other files or subdirectories. Links can be implemented as either absolute
or relative path names. When a reference to a file is made, the system checks the
directory entry. If it's marked as a link, the actual file's name is included in the link
information, allowing the system to resolve the link and locate the real file.

b. Duplication: Another approach involves duplicating all information about shared files
in both directories where they are linked. This means that both entries are identical and
equal. However, maintaining consistency when a file is modified becomes a challenge.

● Challenges:
● Multiple Path Names: A file may have multiple absolute path names,
complicating traversal and management.
● Deletion: Handling file deletion in a shared context presents challenges.
● Removing the file when anyone deletes it may leave dangling
pointers to the now-deleted file.
● Deletion of symbolic links may not affect the original file, leaving the
links dangling. Searching for and removing such links can be
resource-intensive.
● An alternative approach is to preserve the file until all references to
it are deleted. This requires mechanisms to track references, such
as maintaining a list of references for each file. When the list is
empty, the file can be deleted.

File System Structure

Disks provide most of the secondary storage on which file systems are maintained.
Two characteristics make them convenient for this purpose are,
1. A disk can be rewritten.
2. A disk can access directly any block of information it contains.
File systems provide efficient and convenient access to the disk by allowing data to
be stored, located, and retrieved easily.

Layered design of a File systems


Each level in the design uses the features of lower levels to create new features for
use by higher levels.
Application Programs : It contains user code that is making a request.
Logical File System: The logical file system manages the directory
structure to provide the file-organization module with this information.
File-Organization Module: The file-organization module knows about files and
their logical blocks and physical blocks. By knowing the type of file allocation used and
the location of the file, the file organization module can translate logical block
addresses to physical block addresses for the basic file system to transfer.
Basic File System: The basic file system needs only to issue generic commands to
the appropriate device driver to read and write physical blocks on the disk. Each
physical block is identified by its numeric disk address.
I/O control: The I/O control level consists of device drivers and interrupts handlers
to transfer information between the main memory and the disk system. It acts like a
translator, inputting high-level commands such as “read123.” And outputting low-
level, hardware -specific instructions that are used by the hardware controller.
Devices: These are the actual hardware devices like disk.

Allocation methods
Many files can be stored on the same disk. The main problem is how to allocate
space to these files so that disk space is utilized effectively and files can be
accessed quickly. The following are the three major methods of allocating disk space
that are in wide use:
1. Contiguous Allocation:

Contiguous allocation requires that each file occupy a set of contiguous blocks
on the disk. Disk addresses define a linear ordering on the disk.
The directory entry for each file indicates the address of the starting block and
the length of the area allocated for this file.
Drawbacks:
1. External Fragmentation.
2. Difficult to find the contiguous blocks.

3.Linked Allocation:
Linked allocation solves all problems of contiguous allocation. With linked allocation,
each file is a linked list of disk blocks; the disk blocks may be scattered anywhere on
the disk.

o Advantages
- There is no external fragmentation with linked allocation, and any free block
on the free-space list can be used to satisfy a request.
- The size of a file need not be declared when the file is created. A file can
continue to grow as long as free blocks are available.
o Disadvantages
- Another disadvantage is the space required for the pointers. If a pointer
requires 4 bytes out of a 512-byte block, then 0.78 percent of the disk is being
used for pointers, rather than for information.
- Traversing time is more, so access time is also more.

3.Indexed Allocation
Linked allocation solves the external-fragmentation and size-declaration problems
of contiguous allocation. Since the pointers to the blocks are scattered with the
blocks themselves all over the disk and must be retrieved in order.
Indexed allocation solves this problem by bringing all the pointers together into one
location: the index block.

Advantages
- There is no external fragmentation.
- Direct access is possible.
Disadvantages
- Wasted space within index box.

Disk Space Management

Free space management is a crucial function of operating systems, as it ensures


that storage devices are utilized efficiently and effectively.
The system keeps tracks of the free disk blocks for allocating space to files when
they are created. Also, to reuse the space released from deleting the files, free
space management becomes crucial.
The system maintains a free space list which keeps track of the disk blocks that are
not allocated to some file or directory. The free space list can be implemented
mainly as:
➢ Bitmap or Bit vector – A Bitmap or Bit Vector is series or collection of bits
where each bit corresponds to a disk block. The bit can take two values: 0
and 1 where 0 indicates that the block is allocated and 1 indicates a free
block

Advantages :
- Simple to understand.
- Finding the first free block is efficient.

➢ Linked List -- In this approach, the free disk blocks are linked together i.e.
a free block contains a pointer to the next free block. The block number of
the very first disk block is stored at a separate location on disk and is also
cached in memory.
A drawback of this method is the I/O required for free space list traversal.

➢ Grouping – This approach stores the address of the free blocks in the first free
block. The first free block stores the address of some, say n free blocks. Out of
these n blocks, the first n-1 blocks are actually free and the last block contains
the address of next free n blocks. An advantage of this approach is that the
addresses of a group of
free disk blocks can be found easily.
➢ Counting – This approach stores the address of the first free disk block and
a number n of free contiguous disk blocks that follow the first block. Every
entry in the list would contain:

1. Address of first free disk block


2. A number n

System Calls:

CREATE:
• This is used to create a file.
• Two steps are necessary to create a file.
• First, space in the file system must be found for the file.
• Second, an entry for the new file must be made in the directory.
• Open system call opens existing file but creat system call creates a new file.
• Syntax:

fd=creat (pathname,modes);

fd : file descriptor for open file


Modes: access rights(owner,group,others)

OPEN:
• open() system call opens a file for reading, writing or reading and writing.
• When a file has been opened its entry is added in the open file table. It also
contains open count associated with each file to indicate how many processes
have the file open.
• Return value is a file descriptor, small, non negative integer that is used in
subsequent system calls to refer to an open file.
• Syntax:

fd = open (pathname, flags, modes);

Flag modes: access modes(O_RDONLY,O_RDWR,O_WRONLY,O_APPEND)

Mode is optional.
• example:

fd = open(“m.txt”,O_CREAT|O_RDWR,0777)

CLOSE:
• This closes a file when it is no longer needed.
• Each close () decrements the open count and when the count reaches zero, the
file is no longer in use so it can be closed.
• When the close system call completes, the user file descriptor table entry is
empty.
• Syntax:

close(fd);

READ:
• Reads from file descriptor.
• pointer to the location in the file where the next read is to take place. Once the
read has taken place, the read pointer is updated .
• Syntax:

number = read(fd,buffer,count);

buffer : address of data structure in user process


Count : number of bytes user wants to read

• This attempts to read upto count bytes from file descriptor fd into buffer.
• On success the number of bytes read is written.
• Returns 0 when reached end of the file.

• Return -1 when error occurs

WRITE:
• Data is written to an open file.
• In regular file write starts at file’s current offset.
• The system must keep a write pointer to the location in the file where the next
write is to take place. The write pointer must be updated whenever a write
occurs.
• Return value is usually equal to n byter arguments,otherwise error is occured.
• Syntax:

Number = write(fd,buffer,count);

• This attempts to write upto count bytes from buffer into file descriptor.
• On success the number of bytes written is returned.
• Return 0 if nothing is written and -1 if error occured .
LSEEK :
The lseek system call in Unix-like operating systems is used to reposition the file offset
of an open file descriptor. This allows you to move the read/write position
within a file, which is essential for random access or seeking within files.
SYNTAX:
#include <unistd.h>
off_t lseek(int fd, off_t offset, int whence);
● fd: The file descriptor of the open file you want to seek within.
● offset: The offset by which to move the file pointer.
● whence: A reference point for the offset movement, which can take one of three
values:
1. SEEK_SET: Set the file pointer to offset bytes from the beginning of the
file.
2. SEEK_CUR: Set the file pointer to the current position plus offset bytes.
3. SEEK_END: Set the file pointer to offset bytes from the end of the file.

STAT
The stat system call in Unix-like operating systems is used to retrieve information about
a file, such as its metadata and attributes. It provides a way to access details like file
size, permissions, timestamps, and more. Here's the syntax and usage in points:

Syntax:
#include <sys/stat.h>
int stat(const char *path, struct stat *buf);
● path: A string representing the path to the file whose information you want to
retrieve.
● buf: A pointer to a `struct stat` that will be filled with the file's attributes after the
`stat` call.

Usage:
1. Include the <sys/stat.h> header.
2. Define a struct stat variable to store file attributes.
3. Use the stat function to retrieve information about a file specified by path.
4. The file's attributes will be stored in the `struct stat` pointed to by buf.

IOCTL
The ioctl system call in Unix-like operating systems provides a general-purpose
interface for controlling various devices and performing I/O operations that don't fit into
standard read and write operations. It allows applications to issue commands and
request information from device drivers. Here's the syntax and usage in points:
Syntax:
#include <sys/ioctl.h>
int ioctl(int fd, unsigned long request, ...);
● fd: The file descriptor associated with the device or file to which the operation
applies.
● request: An unsigned long integer specifying the request code, which determines
the operation to be performed.
● ...: Additional arguments may be required based on the specific `request` code.

Usage:
1. Include the <sys/ioctl.h> header.
2. Open a file or device using open() to obtain a file descriptor (fd).
3. Use the ioctl function to send a request to the device or file.
4. The request code determines the specific operation, and additional arguments
may be required.
5. The return value of ioctl depends on the specific request and can vary widely, so
check the documentation for the requested operation.

You might also like